Article::Article
Back in the day, when I was still a student, during the end of one of my internships, instead of writing my report I wrote a generator of spelling mistake for French named YololTranslator. You probably have no idea what it looks like, if you speak French, I invite you to visit the online version. Otherwise, imagine that at first it would write your
instead of you're
and stuff like that. (But it became way more powerfull)
It was a CLI tool written in C++, then I wrote a bot discord using it, after that, an API and finally a friend created the front for the website. The spelling mistake generator was still in C++ but everything else was in JavaScript with Nodejs. But using C++ code with Nodejs was a pain in the ass, especially when I had a different version of node between my computer and my Raspberry Pi.
Each time I improved the spelling mistake engine, deploying it was not fun at all. One day I got tired of it and I decided to re-write the Discord bot and the backend in Python. I liked Python (and still do), there are some good libraries to create a Discord bot and a backend (FastAPI is fantastic) and also using C functions in Python is trivial. At the same time I created a docker image for the bot and the website, deployment is way easier now but that's another story I may talk about one day.
This article is about how I used my C++ functions in my Python scripts.
Initial implementation with ctypes
Overview
C nowadays is the lingua franca for programming. You can, in almost any language, use C functions from a shared library easily. Python is no exception, there is a dedicated module for that: ctypes. That's why I chose it in my initial solution.
Create the C interface
To create the C interface, I needed to create functions with C linkage. It's really simple to do, just create functions with declarations containing only C code and put them inside an extern "C"
statement. TThen you can write whatever you want in the implementation. The limitation will only be in the argument and return type. It means no std::string, no classes, only good old const char* and struct without any method.
My higher-level C++ class looked like this:
class Translator
{
public:
Translator(std::string_view phonems_list_filename, std::string_view words_phonem_filename, std::string_view word_dict_filename);
std::string operator()(std::string_view sentence_to_translate) const;
std::string translate_word(std::string_view word_to_translate) const;
private:
WordTranslator _word_translator;
WordToPhonems _word_to_phonem;
PhonemListToString _phonem_list_to_string;
};
It's pretty straighforward, you initialize it, then you can generate error for an entire sentence or for just one word and you also have the default constructor automatically generated.
The C interface will look a lot like this:
extern "C"
{
typedef struct YololTranslationS
{
int size;
char* translation;
} YololTranslation;
void yolol_init(const char* phonems_to_chars, const char* word_to_phonem, const char* word_to_word);
YololTranslation yolol_translate(const char* str, int size);
void yolol_free(YololTranslation translation);
void yolol_destroy();
} // extern "C"
We have:
-
YololTranslation
containing the result of a translation, with a char* dynamically allocated, and a size even if the string will be null terminated I think it is good practice. -
yolol_init
acting like the constructor, note that for simplicity's sake, you can use only oneTranslator
at a time. It was (and still is) the only use case. -
yolol_translate
to generate the errors. -
yolol_free
because the char* in theYololTranslation
is dynamically allocated. -
yolol_destroy
acting like the destructor
And then in the implementation I just use my Translator
class:
static std::optional<Translator> translator;
void yolol_init(const char* phonems_to_chars, const char* word_to_phonem, const char* word_to_word)
{
translator.emplace(phonems_to_chars, word_to_phonem, word_to_word);
}
YololTranslation yolol_translate(const char* str, int size)
{
auto translation = (*translator)(std::string_view(str, static_cast<std::size_t>(size)));
YololTranslation result;
result.size = static_cast<int>(translation.size());
#if defined(_WIN32)
result.translation = _strdup(translation.data());
#else
result.translation = strdup(translation.data());
#endif
return result;
}
void yolol_free(YololTranslation translation)
{
free(reinterpret_cast<void*>(translation.translation));
}
void yolol_destroy()
{
translator.reset();
}
I made sure in my CMakeLists.txt that I created a shared library and not a static one:
add_library(YololTranslator SHARED ${SRCS})
The last step is to make sure the symbols of these functions are exported, with gcc and clang everything is exported by default, and with MSVC (Microsoft compiler) I added this line to make sure that it was:
set(CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS ON)
I know it's ugly, but it's a little pet project, it's fast to do and it works, I did not see any harm.
Use the C interface from Python
As I said, Python has a module named ctypes to use easily, but verbosely C functions from a shared library.
To load a shared library you just need its path:
lib = ctypes.CDLL(lib_path)
To define a structure you just need to create a class inheriting from ctypes.Structure
and declare the attributes:
class YololTranslation(ctypes.Structure):
_fields_ = [
('size', ctypes.c_int),
('translation', ctypes.c_char_p)
]
And to declare and use a function can just do that:
# "lib" was created using ctypes.CDLL as showed previously
# ctypes.c_char_p correspond to a const char* and c_int an int
lib.yolol_translate.argtypes = [ctypes.c_char_p, ctypes.c_int]
# "YololTranslation" was the declared structure in the previous example
lib.yolol_translate.restype = YololTranslation
# A char* is just an array of bytes, so you need to encode your string to get one
buff = "Salut, je veux des fautes!".encode()
# Call the C function
translation_result = self.lib.yolol_translate(buff, len(buff))
# Use the result
str_with_spelling_mistake = translation_result.translation.decode()
# Possible content of str_with_spelling_mistake :
# "salu je veu dè phaut!!"
You can see the file with the complete code here.
Why rewrite it?
I just wanted to try PyBind11 for something else than a "hello world" project. That's it, it wasn't hard to convince myself.
Use PyBind11
Installation
At first, I wanted to install it using Vcpkg but I ended up using CPM.cmake instead. Why? Because with Vcpkg on some platform it is messing up the config and choose a version of Python used in Vcpkg instead of the one I wanted to use. It is not that hard to fix, but why bother when I can install the library in one line with CPM.cmake
CPMAddPackage("gh:pybind/pybind11#v2.10.3")
If you want to learn more about CPM.cmake I advise you to read this article.
And if you really want to install it with Vcpkg but you have the same problem as me: no error but the module was not created (that was so much fun to understand -_-) because apparently it was choosing the wrong python, this line may help you :
find_package(Python COMPONENTS Interpreter Development)
Create a module with CMake
This part is really simple. You have this CMake function pybind11_add_module
that acts like add_library
or add_executable
.
In my project I did this:
# You can use as many source file as your want
pybind11_add_module(yolol source_file_1)
# You can link it with a library as you would do with a regular target
target_link_libraries(yolol PUBLIC YololTranslator)
Create the bindings in the C++ code
The most basic example from the documentation is this:
#include <pybind11/pybind11.h>
int add(int i, int j) {
return i + j;
}
PYBIND11_MODULE(example, m) {
m.doc() = "pybind11 example plugin"; // optional module docstring
m.def("add", &add, "A function that adds two numbers");
}
It creates a module named example
and it this module you have a function named add
. The prototype of the function is deduced so you just to pass a pointer to this function, nothing else.
That's cool, but I don't want to export a function, I want to export a class with a constructor and a member function. That's almost as easy:
#include <Translator.hpp>
#include <pybind11/pybind11.h>
// Because I'm lazy and there is no risk of name collision
namespace py = pybind11;
PYBIND11_MODULE(yolol, m) {
m.doc() = "Yolol bindings";
py::class_<Translator>(m, "Translator")
.def(py::init<std::string_view, std::string_view, std::string_view>())
.def("translate", &Translator::operator());
}
I created a class using py::class_
with its name, then I defined the constructor, in Python it will correspond to the init
method. For this constructor I need to specify the the argument.
The I added a method named translate
but using the operator()
, you may find it weird, but it works well because an operator overload is a regular function.
I can still improve this a little. The constructor has three arguments with the same type, it can create some confusion and I might mix up the argument. In Python there are named arguments, let's use them:
#include <Translator.hpp>
#include <pybind11/pybind11.h>
namespace py = pybind11;
PYBIND11_MODULE(yolol, m) {
m.doc() = "Yolol bindings";
py::class_<Translator>(m, "Translator")
.def(py::init<std::string_view, std::string_view, std::string_view>(),
py::arg("word_to_phonem_file"),
py::arg("phonem_to_chars_file"),
py::arg("word_to_word_file"))
.def("translate", &Translator::operator());
}
I just needed to add some py::arg
with the name of the argument, that's all. Easy, isn't it?
Use the bindings in the Python code
Now that the bindings are created, I want to use them. For that, I just need to import the package and then I can use my Translator class as a normal Python class:
from yolol import Translator
# If the .so/.pyd is in another directory you can write the path
# But instead of / use a .
# from dir.subdir.yolol import Translator
# Create the Translator
translator = Translator(word_to_phonem_file='a_path.txt', phonem_to_chars_file='another/path.json', word_to_word_file='foo/bar/beer.json')
# Use it
str = translator.translate("Le reblochon c'est trop bon.")
print(str) # Possible output: "l reblochon sê trrau bon"
Some notes that may help you
-fPIC missing
If you have an error like this while building:
warning: relocation against `_ZTVN8nlohmann16json_abi_v3_11_26detail10type_errorE' in read-only section `.text.unlikely'
#11 25.25 /usr/bin/ld: src/libYololTranslator.a(WordTranslator.cpp.o): relocation R_X86_64_PC32 against symbol relocation R_X86_64_PC32 against symbol `_ZZNSt8__detail18__to_chars_10_implImEEvPcjT_E8__digits' can not be used when making a shared object; recompile with -fPIC
That's because as the error says the flag -fPIC
is missing and the clean way to add it in CMake is:
# For the whole project
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
# For just a target
set_target_properties(my_target PROPERTIES POSITION_INDEPENDENT_CODE TRUE)
Missing Python header
If you have an error like this:
CMake Error in CMakeLists.txt:
Imported target "pybind11::module" includes non-existent path
"/usr/include/python3.10"
in its INTERFACE_INCLUDE_DIRECTORIES. Possible reasons include:
* The path was deleted, renamed, or moved to another location.
* An install or uninstall procedure did not complete successfully.
* The installation package was faulty and references files it does not
provide.
It means that the Python header are not installed. On Ubuntu (and probably all the OS based on Debian) you can fixe this by installing them with apt.
RUN apt install python3-dev
Article::~Article
We have seen two ways to call my C++ code from Python: ctypes module and PyBind11. In each case the example was quite trivial and, especially for PyBind11, we did not use these tools to their maximum potential. There are a lot of other features we did not see. You can consider this article more like an introduction to the wonderful world of bindings between C++ and Python.
Note that PyBind11 is not the only library to create bindings between C++ and Python, there is also:
- Boost.Python which is kinda the predecessor of PyBind11
- Nanobind made by the creator of PyBind11, it has a similar interface, but it takes leverage of C++17 and it aims to have more efficient bindings in space and speed.