Manticore Buddy: challenges and solutions

Sergey Nikolaev - Feb 28 '23 - - Dev Community

Hey there! 🤗 We hope y’all have already checked out our Buddy Intro and have a good understanding of how it works. We want to share our journey and experiences developing it and the challenges we faced.

At Manticore Software, we encountered two main challenges:

  • Expanding Manticore Search with non-performance critical features without modifying C++.
  • Making it easier to contribute to enhancements and new feature implementations.

We were determined to find a solution. So, let’s dive into our journey to develop Buddy and the issues we faced. Ready? Let’s go!

The beginning of the journey

Our journey started by examining the issue closely. Although Manticore Search is an exceptional search database product, we faced difficulties in releasing new features at the desired speed because the codebase is written in C++. Writing C++ code requires low-level interaction with data structures, bytes, deep knowledge of how the machine works, compilation, debugging, and finding the best approach to write it, which takes a great deal of time but results in faster program execution. This drastically delayed the development process. Although C++ is a great option for performance, it takes time to develop. We wanted to move quickly, ship more features, and do so consistently.

We came up with the idea of creating a companion for our primary searchd process, which could process failed queries from Manticore Search and return results to the original client. We didn’t take long to decide which language to use and settled on PHP for several reasons:

  • Most of the core team was familiar with it, so it would take less time to make it work.
  • PHP is fast, especially with the newest version (8+), even when we did not require performant execution from the Buddy. It’s faster than Python or JavaScript, so it was a good fit for our requirements.
  • PHP is not only fast but also simple, reducing the level of expertise needed to contribute to the future ecosystem.

That’s why we chose PHP and began implementing basic code to understand what we would need later.

We still use C++ to develop speed-critical features. C++ is ideal for tasks that require speed. For tasks that don’t need much speed, Buddy is the optimal choice.

This is how Buddy was created. To make this possible on the C++ side of Manticore Search, we implemented a separate loop and communication between the searchd daemon and the Buddy PHP process using the CURL extension. We developed our internal protocol, which is a simple JSON, to route queries to Buddy; it handles these queries and sends us an appropriate response to be passed back to the original client.

Implementing the communication protocol

When starting a new project, it’s important to stay flexible and not overplan. In our case, we began with a basic implementation of communication using the sockets extension in PHP. While it worked well, it wasn’t scalable. Our goal was to connect Manticore Search with Buddy, and this simple implementation allowed us to validate that idea.

Instead of reinventing the wheel, we researched options for making the system more scalable and non-blocking. We initially considered OpenSwoole, but due to a license issue, we couldn’t use it. We then found ReactPHP, which had a suitable license. So, we decided to go with ReactPHP.

ReactPHP is a plain PHP framework that allows us to run a TCP server in async mode.

This choice worked well for us since it allowed us to handle multiple requests simultaneously and easily scale the system.

Next, we rewrote our simple Buddy MVP and created a core structure that would make it easy to add handling of new SQL commands in the future. The process is as follows:

  • Manticore Search receives an SQL query from the user and attempts to handle it.
  • If Manticore Search can handle the query, it returns the response to the client without involving Buddy.
  • If Manticore Search cannot handle the query, it sends a special structure with all information about the input query and any errors to Buddy.
  • Buddy parses the structure and checks if there is a handler for it. If there isn’t, it returns the same error that Manticore Search would send to the client.
  • If everything is good and we have an implementation to handle the query, we split the process into two steps: preparing the request with the required data and handling it with the handler logic. The request is a simple structure that represents a class with predefined variables and input parameters parsed from the input SQL query. If anything goes wrong, it may fail and return an error to the client.
  • The handler then receives the request, does the necessary work, and returns the final result to the HTTP request.

This system is easy to maintain, simple, and can be easily extended with new functionality. However, there is an issue with this approach. If we have something heavy or need to wait in Buddy, it can slow down concurrent requests. This is because, although async isn’t parallel, PHP is still blocking code, and ReactPHP uses fibers to emulate the async approach. We’ll discuss this issue in more detail in the next section.

Async problem in PHP and scale for concurrency

To handle heavy loads, our team at Manticore Search needed a solution that could handle more requests than ReactPHP. While ReactPHP worked for implementing an async HTTP server and handling some concurrency, it wasn’t scalable enough. After a quick search, we chose to use the parallel extension over pthreads because of its maintainability and reliability.

But what is parallel? It’s about parallelization, creating independent Runtimes that represent threads running in parallel. These threads can communicate through channels, which Parallel provides, allowing us to send data from the main ReactPHP loop process to the paralleled and detached thread in another Runtime without reinventing the wheel.

This approach was our silver bullet to handle high levels of concurrency and keep response time low. To make it happen, we implemented a Task component to run tasks in a parallel thread at the Handler level. This way, the main process didn’t block, allowing us to handle many concurrent requests easily.

Initially, we created a Runtime on each request and destroyed it at the end of execution. However, this approach caused performance degradation with many requests received. To handle more requests, we prepared Runtimes on the first launch and used them in a round-robin fashion. This way, we could limit the maximum threads created and not exceed the total number of cores, which would also affect performance.

While we solved the performance and concurrency problem with ease, we faced another challenge: deployment. Not all operating systems support the latest PHP 8 version, and some uncommon extensions are not included in the default installation. But we found a solution, and you can too.

Say hi to manticore-executor

We conducted research to find a painless way to ship the new tool to customers and discovered a great approach - compiling both PHP and Buddy into a single static binary. This involved injecting PHP into its sources and creating a binary that could run. However, we encountered an obstacle because we wanted to mix different licenses - PHP 3.01 and GPL 2.0 - which was not feasible. As a result, we chose to pre-build PHP, link it statically, and name it manticore-executor.

Unfortunately, the process was not simple. We attempted to build it with Ubuntu but encountered a problem - we needed OpenSSL to establish secure connections to external domains. However, when using dynamic GCC, we couldn’t link OpenSSL statically.

Why did we use GCC? It was necessary for compiling PHP and its extensions. The issue was that we required a statically-built GCC to link statically, which is not straightforward and necessitates a lot of work. As a result, we sought out alternatives.

Thankfully, we discovered MUSL and Alpine, which allowed us to build a fully static version of PHP with all required extensions and libs without difficulty! Furthermore, it works on any Linux distribution.

Alpine Linux is an excellent choice for compiling C programs due to its small size and lightweight nature, making it suitable for systems with limited resources such as embedded devices or containers. Additionally, Alpine Linux is secure. It employs a hardened kernel and few packages, limiting the attack surface and making it less vulnerable to security threats. This is particularly important for C programs, which can be susceptible to security vulnerabilities.

In addition, Alpine Linux employs MUSL libc as its standard C library, which is a lightweight and efficient C library that results in faster and more efficient code than other C libraries.

As a result, we utilized it and set up actions to utilize an Alpine image and build it in Docker. The beauty of this approach is that it also made it easier for us to build for ARM because Docker has the buildx command, allowing us to utilize QEMU in a ready-to-build schema and accomplish the same flow to build for AMD and ARM architectures on the same machine! Check out our build flow here.

Github Actions automates the building and deploying of packages for all supported operating systems. For users, installation is simple: just run apt-get install manticore-executor or yum install manticore-executor, and you’ll have a PHP version ready to use with all necessary packages pre-installed to run any Manticore-shipped PHP project. Easy!

How we ship our source code

At Manticore Search, we faced the challenge of providing our PHP application, made up of multiple source code files, to the user. We had many files that were spread across separate folders and dependencies that had to be installed with Composer, making the installation process complicated.

As you remember, we developed a custom PHP version, called manticore-executor, which could be easily installed from repositories. However, this still did not solve the problem of providing the entire PHP application to the user.

We found a solution in PHAR, which allowed us to build a single file that could be added as a package to the repository. This simplified the installation process. However, ensuring that all dependencies were included correctly in the final PHAR archive was tricky. To solve this, we created and separated an external build system, which we also use for our manticore-backup tool.

To make the package executable, we decided to use a Bash and Shebang script with our Manticore-Executor package. This script checks the date of the modified PHAR in the system’s temporary folder and extracts the PHAR data there, allowing for multiple launches that remain performant and up-to-date on new versions installed. For more information on how we implemented this, you can refer to our phar_builder project on GitHub.

Lessons learned

  1. Start with a simple, basic solution without using software design patterns when uncertain about the future success of a project. Prioritize validation first and then refactor and iterate on updates.
  2. Concurrency in PHP can be challenging, but using threading and async frameworks can help achieve high throughput. For optimal performance, it’s recommended to use both. Preallocating runtimes for threads can help reach desired performance.
  3. Simplify the shipping process for users. Reduce the number of instructions needed. In our case, one PHAR archive and one binary with all included extensions for our custom PHP solved the issue.
  4. Use the most recent versions of PHP or other tools to stay on top of the latest developments and keep your data secure. Outdated software can be vulnerable to hacks and security breaches. Upgrading offers improved performance, the latest features, and an efficient coding process.
  5. Look for packages that can solve your problem and examine their dependencies. Choose packages with minimal dependencies to avoid dependency hell. Use small packages like building blocks rather than creating a custom solution.

Final outcome

Throughout the development of Buddy, we faced numerous challenges, which we overcame with excitement. The tool is entirely written in PHP and is shipped as an OS package, making it incredibly easy for users to install and for us to maintain and automate builds, thanks to GitHub Actions. While there is still room for improvement to make the tool even simpler, our story demonstrates how it’s possible to build an easy-to-maintain and easy-to-install tool, all with the power of PHP.

We hope you enjoyed reading about our journey leading up to the release of Manticore 6.0.0. Be sure to stay tuned for our next article, where we explore the new pluggable design in Buddy and its easy-to-contribute ecosystem, which benefits the entire community. It’s truly an exciting time, and we can’t wait to share more with you.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .