Create Python bindings for my C++ code with PyBind11

Lena - Feb 13 '23 - - Dev Community

Article::Article

Back in the day, when I was still a student, during the end of one of my internships, instead of writing my report I wrote a generator of spelling mistake for French named YololTranslator. You probably have no idea what it looks like, if you speak French, I invite you to visit the online version. Otherwise, imagine that at first it would write your instead of you're and stuff like that. (But it became way more powerfull)

It was a CLI tool written in C++, then I wrote a bot discord using it, after that, an API and finally a friend created the front for the website. The spelling mistake generator was still in C++ but everything else was in JavaScript with Nodejs. But using C++ code with Nodejs was a pain in the ass, especially when I had a different version of node between my computer and my Raspberry Pi.

Each time I improved the spelling mistake engine, deploying it was not fun at all. One day I got tired of it and I decided to re-write the Discord bot and the backend in Python. I liked Python (and still do), there are some good libraries to create a Discord bot and a backend (FastAPI is fantastic) and also using C functions in Python is trivial. At the same time I created a docker image for the bot and the website, deployment is way easier now but that's another story I may talk about one day.

This article is about how I used my C++ functions in my Python scripts.

Initial implementation with ctypes

Overview

C nowadays is the lingua franca for programming. You can, in almost any language, use C functions from a shared library easily. Python is no exception, there is a dedicated module for that: ctypes. That's why I chose it in my initial solution.

Create the C interface

To create the C interface, I needed to create functions with C linkage. It's really simple to do, just create functions with declarations containing only C code and put them inside an extern "C" statement. TThen you can write whatever you want in the implementation. The limitation will only be in the argument and return type. It means no std::string, no classes, only good old const char* and struct without any method.

My higher-level C++ class looked like this:

class Translator
{
    public:
        Translator(std::string_view phonems_list_filename, std::string_view words_phonem_filename, std::string_view word_dict_filename);

        std::string operator()(std::string_view sentence_to_translate) const;
        std::string translate_word(std::string_view word_to_translate) const;

    private:
        WordTranslator _word_translator;
        WordToPhonems _word_to_phonem;
        PhonemListToString _phonem_list_to_string;
};

Enter fullscreen mode Exit fullscreen mode

It's pretty straighforward, you initialize it, then you can generate error for an entire sentence or for just one word and you also have the default constructor automatically generated.

The C interface will look a lot like this:

extern "C"
{

typedef struct YololTranslationS
{
    int size;
    char* translation;
} YololTranslation;

void yolol_init(const char* phonems_to_chars, const char* word_to_phonem, const char* word_to_word);

YololTranslation yolol_translate(const char* str, int size);
void yolol_free(YololTranslation translation);

void yolol_destroy();

} // extern "C"
Enter fullscreen mode Exit fullscreen mode

We have:

  • YololTranslation containing the result of a translation, with a char* dynamically allocated, and a size even if the string will be null terminated I think it is good practice.
  • yolol_init acting like the constructor, note that for simplicity's sake, you can use only one Translator at a time. It was (and still is) the only use case.
  • yolol_translate to generate the errors.
  • yolol_free because the char* in the YololTranslation is dynamically allocated.
  • yolol_destroy acting like the destructor

And then in the implementation I just use my Translator class:

static std::optional<Translator> translator;

void yolol_init(const char* phonems_to_chars, const char* word_to_phonem, const char* word_to_word)
{
    translator.emplace(phonems_to_chars, word_to_phonem, word_to_word);
}

YololTranslation yolol_translate(const char* str, int size)
{
    auto translation = (*translator)(std::string_view(str, static_cast<std::size_t>(size)));
    YololTranslation result;
    result.size = static_cast<int>(translation.size());
    #if defined(_WIN32)
        result.translation = _strdup(translation.data());
    #else
        result.translation = strdup(translation.data());
    #endif
    return result;
}

void yolol_free(YololTranslation translation)
{
    free(reinterpret_cast<void*>(translation.translation));
}

void yolol_destroy()
{
    translator.reset();
}

Enter fullscreen mode Exit fullscreen mode

I made sure in my CMakeLists.txt that I created a shared library and not a static one:

add_library(YololTranslator SHARED ${SRCS})
Enter fullscreen mode Exit fullscreen mode

The last step is to make sure the symbols of these functions are exported, with gcc and clang everything is exported by default, and with MSVC (Microsoft compiler) I added this line to make sure that it was:

set(CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS ON)
Enter fullscreen mode Exit fullscreen mode

I know it's ugly, but it's a little pet project, it's fast to do and it works, I did not see any harm.

Use the C interface from Python

As I said, Python has a module named ctypes to use easily, but verbosely C functions from a shared library.

To load a shared library you just need its path:

lib = ctypes.CDLL(lib_path)
Enter fullscreen mode Exit fullscreen mode

To define a structure you just need to create a class inheriting from ctypes.Structure and declare the attributes:

class YololTranslation(ctypes.Structure):
    _fields_ = [
        ('size', ctypes.c_int),
        ('translation', ctypes.c_char_p)
    ]
Enter fullscreen mode Exit fullscreen mode

And to declare and use a function can just do that:

# "lib" was created using ctypes.CDLL as showed previously
# ctypes.c_char_p correspond to a const char* and c_int an int
lib.yolol_translate.argtypes = [ctypes.c_char_p, ctypes.c_int]
# "YololTranslation" was the declared structure in the previous example
lib.yolol_translate.restype = YololTranslation

# A char* is just an array of bytes, so you need to encode your string to get one
buff = "Salut, je veux des fautes!".encode()
# Call the C function
translation_result = self.lib.yolol_translate(buff, len(buff))
# Use the result
str_with_spelling_mistake = translation_result.translation.decode()
# Possible content of str_with_spelling_mistake :
# "salu  je veu dè phaut!!"
Enter fullscreen mode Exit fullscreen mode

You can see the file with the complete code here.

Why rewrite it?

I just wanted to try PyBind11 for something else than a "hello world" project. That's it, it wasn't hard to convince myself.

Use PyBind11

Installation

At first, I wanted to install it using Vcpkg but I ended up using CPM.cmake instead. Why? Because with Vcpkg on some platform it is messing up the config and choose a version of Python used in Vcpkg instead of the one I wanted to use. It is not that hard to fix, but why bother when I can install the library in one line with CPM.cmake

CPMAddPackage("gh:pybind/pybind11#v2.10.3")
Enter fullscreen mode Exit fullscreen mode

If you want to learn more about CPM.cmake I advise you to read this article.

And if you really want to install it with Vcpkg but you have the same problem as me: no error but the module was not created (that was so much fun to understand -_-) because apparently it was choosing the wrong python, this line may help you :

find_package(Python COMPONENTS Interpreter Development)
Enter fullscreen mode Exit fullscreen mode

Create a module with CMake

This part is really simple. You have this CMake function pybind11_add_module that acts like add_library or add_executable.
In my project I did this:

# You can use as many source file as your want
pybind11_add_module(yolol source_file_1)
# You can link it with a library as you would do with a regular target
target_link_libraries(yolol PUBLIC YololTranslator)
Enter fullscreen mode Exit fullscreen mode

Create the bindings in the C++ code

The most basic example from the documentation is this:

#include <pybind11/pybind11.h>

int add(int i, int j) {
    return i + j;
}

PYBIND11_MODULE(example, m) {
    m.doc() = "pybind11 example plugin"; // optional module docstring

    m.def("add", &add, "A function that adds two numbers");
}
Enter fullscreen mode Exit fullscreen mode

It creates a module named example and it this module you have a function named add. The prototype of the function is deduced so you just to pass a pointer to this function, nothing else.

That's cool, but I don't want to export a function, I want to export a class with a constructor and a member function. That's almost as easy:

#include <Translator.hpp>

#include <pybind11/pybind11.h>

// Because I'm lazy and there is no risk of name collision
namespace py = pybind11;

PYBIND11_MODULE(yolol, m) {
    m.doc() = "Yolol bindings";

    py::class_<Translator>(m, "Translator")
        .def(py::init<std::string_view, std::string_view, std::string_view>())
        .def("translate", &Translator::operator());
}
Enter fullscreen mode Exit fullscreen mode

I created a class using py::class_ with its name, then I defined the constructor, in Python it will correspond to the init method. For this constructor I need to specify the the argument.
The I added a method named translate but using the operator(), you may find it weird, but it works well because an operator overload is a regular function.

I can still improve this a little. The constructor has three arguments with the same type, it can create some confusion and I might mix up the argument. In Python there are named arguments, let's use them:

#include <Translator.hpp>

#include <pybind11/pybind11.h>

namespace py = pybind11;

PYBIND11_MODULE(yolol, m) {
    m.doc() = "Yolol bindings";

    py::class_<Translator>(m, "Translator")
        .def(py::init<std::string_view, std::string_view, std::string_view>(),
            py::arg("word_to_phonem_file"),
            py::arg("phonem_to_chars_file"),
            py::arg("word_to_word_file"))
        .def("translate", &Translator::operator());
}
Enter fullscreen mode Exit fullscreen mode

I just needed to add some py::arg with the name of the argument, that's all. Easy, isn't it?

Use the bindings in the Python code

Now that the bindings are created, I want to use them. For that, I just need to import the package and then I can use my Translator class as a normal Python class:

from yolol import Translator
# If the .so/.pyd is in another directory you can write the path
# But instead of / use a .
# from dir.subdir.yolol import Translator

# Create the Translator
translator = Translator(word_to_phonem_file='a_path.txt', phonem_to_chars_file='another/path.json', word_to_word_file='foo/bar/beer.json')

# Use it
str = translator.translate("Le reblochon c'est trop bon.")
print(str) # Possible output: "l reblochon sê trrau bon" 
Enter fullscreen mode Exit fullscreen mode

Some notes that may help you

-fPIC missing

If you have an error like this while building:

warning: relocation against `_ZTVN8nlohmann16json_abi_v3_11_26detail10type_errorE' in read-only section `.text.unlikely'
#11 25.25 /usr/bin/ld: src/libYololTranslator.a(WordTranslator.cpp.o): relocation R_X86_64_PC32 against symbol relocation R_X86_64_PC32 against symbol `_ZZNSt8__detail18__to_chars_10_implImEEvPcjT_E8__digits' can not be used when making a shared object; recompile with -fPIC
Enter fullscreen mode Exit fullscreen mode

That's because as the error says the flag -fPIC is missing and the clean way to add it in CMake is:

# For the whole project
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
# For just a target
set_target_properties(my_target PROPERTIES POSITION_INDEPENDENT_CODE TRUE)
Enter fullscreen mode Exit fullscreen mode

Missing Python header

If you have an error like this:

CMake Error in CMakeLists.txt:
Imported target "pybind11::module" includes non-existent path
  "/usr/include/python3.10"
in its INTERFACE_INCLUDE_DIRECTORIES.  Possible reasons include:
* The path was deleted, renamed, or moved to another location.
* An install or uninstall procedure did not complete successfully.
* The installation package was faulty and references files it does not
provide.
Enter fullscreen mode Exit fullscreen mode

It means that the Python header are not installed. On Ubuntu (and probably all the OS based on Debian) you can fixe this by installing them with apt.

RUN apt install python3-dev
Enter fullscreen mode Exit fullscreen mode

Article::~Article

We have seen two ways to call my C++ code from Python: ctypes module and PyBind11. In each case the example was quite trivial and, especially for PyBind11, we did not use these tools to their maximum potential. There are a lot of other features we did not see. You can consider this article more like an introduction to the wonderful world of bindings between C++ and Python.

Note that PyBind11 is not the only library to create bindings between C++ and Python, there is also:

  • Boost.Python which is kinda the predecessor of PyBind11
  • Nanobind made by the creator of PyBind11, it has a similar interface, but it takes leverage of C++17 and it aims to have more efficient bindings in space and speed.

Sources

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .