Meilisearch vs Manticore Search

Sergey Nikolaev - May 2 '23 - - Dev Community

Introduction

In the ever-evolving digital landscape, search engines play an increasingly crucial role in powering search functionalities across various platforms. Among the popular search engines, Meilisearch and Manticore Search stand out with their unique offerings. However, choosing the right search engine for your project requires a thorough understanding of their performance, use cases, and limitations. This article aims to provide a comparison of Meilisearch and Manticore Search, focusing on their feature set and data ingestion and search performance in three real-world benchmarks: 10 million NGINX logs, Hacker News 1.1 million docs dataset, and Hacker News 116 million docs dataset all available at DB Benchmarks. All the performance test scripts, configurations and the data collections are publicly available and reproducible.

Full-text Search Relevance

Both Manticore and Meilisearch position themselves as full-text search engines. The key element in full-text search engines is how they rank documents during a search.

Choosing the right search ranking algorithm is crucial to ensure users can find the information they need with precision and recall. In the context of full-text search relevance, it is essential to understand how these algorithms work and how they contribute to providing accurate and meaningful search results.

Manticore Search is very flexible in controlling search ranking and exposes dozens of ranking factors; however, by default, it employs the classical BM25 algorithm and its derivatives. BM25 is a well-established information retrieval algorithm that calculates the relevance of documents based on term frequency and inverse document frequency.

An ongoing pull request for the BEIR (Benchmarking and Evaluation of Information Retrieval) benchmark demonstrates Manticore Search's commitment to search relevance. BEIR is an evaluation framework that measures the performance of information retrieval systems on various tasks, such as document retrieval and question-answering. The results of the BEIR benchmark can be found here: https://docs.google.com/spreadsheets/d/1_ZyYkPJ_K0st9FJBrjbZqX14nmCCPVlE_y3a_y5KkYI/edit#gid=0.

In contrast, Meilisearch claims to offer good search relevance, but there are no public benchmarks available to substantiate this assertion. According to a discussion on Hacker News, Meilisearch users have mentioned its search relevance, but without any empirical evidence, it is difficult to compare its performance to Manticore Search objectively.

Overall, Manticore Search's use of proven ranking algorithms and participation in the BEIR benchmark highlights its commitment to providing highly relevant search results, making it a reliable choice for various applications. While Meilisearch may excel at full-text search relevance too, it is difficult to make a definitive statement since there are no established benchmarks and the algorithm used is not widely known.

Index Size and Data Ingestion

Manticore Search demonstrates its ability to effectively handle large datasets (e.g. 1.7 billion docs taxi rides test or simply Craigslist.org) through the use of row-wise and columnar storages. The columnar approach is specifically designed to accelerate search performance and lower RAM consumption on large datasets. In contrast, Manticore Search's default row-wise storage offers unbeatable performance on small and medium datasets. This flexibility makes Manticore Search an ideal choice for a wide range of applications.

Meilisearch, on the other hand, struggles with larger datasets, as we could not load the Hacker News larger dataset into the search engine even after 2 days of loading. Furthermore, Meilisearch experiences a degradation in performance when loading documents. As the dataset grows, the time it takes to load each subsequent batch of documents increases. This performance issue indicates that Meilisearch has a problem with data scalability and could be problematic for applications that require real-time data ingestion or indexing of large datasets. Meilisearch processes document updates in a single queue, which can lead to bottlenecks and reduced performance over time.

It is crucial to note that document updates in Meilisearch are not instantly reflected in search queries. This is because Meilisearch employs an asynchronous task queue for handling updates, ensuring search performance remains stable even during intensive indexing operations.

When updating a document, the change is added to the task queue and processed by the engine in the background. Once the task is completed, the updated data becomes available in the search results. The processing time can vary depending on the update size and server resources. To monitor task status, you can utilize the Tasks API, which offers information on task progress and completion.

Manticore offers real-time insert, replace, and delete capabilities, allowing changes to be immediately visible as soon as the query is complete.

In summary, while Meilisearch provides fast and efficient search capabilities, keep in mind that updates to documents might not be immediately visible in search results due to the asynchronous task processing.

Search Performance

Meilisearch is known for its impressive speed, outperforming Elasticsearch in many cases. However, its performance is most noticeable when working with small datasets. As the dataset size increases, Meilisearch's performance may decline.

Manticore Search consistently delivers fast query performance for various query types and dataset types, outperforming both Meilisearch and Elasticsearch. With optimized row-wise and columnar indexing methods, Manticore ensures a responsive search experience, crucial for maintaining user engagement in high-performance applications.

In contrast, Meilisearch struggles with efficiently handling large datasets and suffers from performance degradation during document loading. Therefore, Manticore is the superior choice for those who don't want to worry about their dataset size.

Benchmark Tests

Hacker News Small Dataset (Hacker News Comments)

The Hacker News small dataset benchmark, which features a collection of 1.1 million curated Hacker News comments with numeric fields (source: https://zenodo.org/record/45901/), highlights the higher search performance of Manticore Search over Meilisearch. The dataset contains textual data from comments and numeric fields such as upvotes, timestamps, and user IDs. The benchmark test involves running full-text and analytical queries to assess the search engines' capabilities.

Image description

The benchmark results can also be verified through this link.

Unfortunately, Meilisearch is not capable of executing many types of queries, such as aggregation queries and those with negative full-text search terms.

An interesting aspect of this benchmark is the significant difference in disk space usage between the two search engines:

root@perf3 /perf/test_engines/tests/hn_small/manticore # du -sh idx
1.1G    idx
Enter fullscreen mode Exit fullscreen mode
root@perf3 /perf/test_engines/tests/hn_small/meilisearch # du -sh .
38G     .
Enter fullscreen mode Exit fullscreen mode

Meilisearch requires 34x more disk space to store the same dataset compared to Manticore Search.

In terms of data loading performance it took:

  • Meilisearch 31 minutes
  • Manticore 65 seconds

to fully complete data loading.

Hacker News Large Dataset (116 million comments)

This test involves the same 1.1 million curated Hacker News comments dataset (source: https://zenodo.org/record/45901/), but multiplied 100 times, resulting in about 116 million documents. The benchmark covers both full-text and analytical queries, making it an excellent test case for evaluating search engine capabilities on a larger scale.

Meilisearch couldn't load the data in 2 days. Its performance of inserts degraded as the database grew. We attempted to optimize it but were unsuccessful since all batches, even when we tried to make them parallel, went into a single queue. As a result, we couldn't achieve any improvement in data loads for Meilisearch. It took Meilisearch about 2 days to load only 38% of the data, which already consumed over 850 GB of disk space. This is a stark contrast to Manticore Search, which stored the entire dataset using approximately 100 GB of disk space and took 2 hours 9 minutes to load using a single CPU core (which is virtually linearly scalable).

The inability of Meilisearch to process the entire Hacker News large dataset highlights its challenges in managing and scaling with more extensive data collections. Manticore Search's superior performance in this benchmark underscores its capacity to handle large-scale search requirements, making it a more suitable choice for applications with larger data collections.

Since we couldn't load the data into Meilisearch, you can check the Manticore-only results here.

10 million NGINX logs

This test is based on a dataset containing 10 million NGINX logs. The source of this dataset is Kaggle. Web server logs register various events, providing valuable insights into website visitors, user behavior, crawlers accessing the site, business intelligence, security issues, and more. The benchmark uses a curated list of typical queries that a random DevOps engineer might run.

Manticore Search and Meilisearch exhibited a significant difference in disk space usage for the dataset. Manticore Search used 4.4 GB of disk space, while Meilisearch consumed 69 GB, which is approximately 15 times more than Manticore. Although the difference is less dramatic than the Hacker News small dataset test, it is still noteworthy, especially considering the Logs10m dataset contains less text data.

It took Meilisearch around 20 minutes to fill up the data, whereas Manticore finished in 6 minutes.

You can find the detailed comparison of the performance results using the provided link. Please take note that many empty results are simply due to Meiliesarch being unable to handle certain types of queries. As a result, these queries were skipped during the benchmarking process.

Image description

Features Comparison of Manticore Search and Meilisearch

  • Full-text matching
    • ✅ Manticore: over 20 full-text operators. Percolate search (search in reverse).
    • ❌ Meilisearch: very simple: AND and phrase search. No percolate search.
  • Search Relevance
    • ✅ Manticore employs tried-and-proven classical ranking algorithms (BM25, BM15). The relevance is benchmark-proven. 7 built-in rankers and a custom ranker with 20+ ranking factors.
    • ❌ Meilisearch claims good search relevance but lacks public benchmarks for validation. 6 ranking rules.
  • Storage
    • ✅ Manticore: own row-wise storage for small/medium datasets, own columnar storage with lower RAM requirements suitable for larger datasets
    • ❌ Meilisearch: LMDB with all its advantages, disadvantages, and consequences: e.g., 205GB virtual memory requirement for a 9.1 MB dataset seems odd.
  • Index Size and Data Loading
    • ✅ Manticore accommodates large datasets with columnar and row-wise indexing methods. Easily sync data from MySQL, PostgreSQL, MS SQL, and any other database that supports ODBC, XML, and CSV. True real-time transactional inserts, replaces, and deletes. Binary log. In-place attribute value updates.
    • ❌ Meilisearch has difficulty with larger datasets and experiences performance degradation during document loading. You can upload CSV and JSON. Only asynchronous addition of documents. No in-place updates.
  • Schema
    • ✅ Manticore: Auto-schema. Auto-ID. All attributes are filterable, sortable, and groupable by default.
    • ❌ Meilisearch: Auto-schema. ID can be picked automatically from the document. All fields are full-text searchable by default, but attributes are not filterable or sortable. You must decide on the schema before loading data into the index to avoid full reindexing.
  • Search Performance
    • ✅ Manticore outperforms Meilisearch in search performance.
    • ❌ Meilisearch is less suitable for applications requiring fast and scalable search functionality.
  • High availability
    • ✅ Manticore: replication, distributed tables supporting remote agents with mirroring, and several HA strategies.
    • ❌ Meilisearch: no replication, no distributed searching, no mirroring.
  • Typo Tolerance
    • ✅ Meilisearch offers easier typo tolerance.
    • ❌ Manticore can handle typo tolerance but demands a higher effort in the app.
  • Search preview
    • ✅ Meilisearch features a helpful search preview - a built-in UI for searching through data in the instance.
    • ❌ Manticore does not have this feature.
  • Tokenization
    • ✅ Manticore: highly flexible tokenization: token chars, blended chars, ignored chars, regular expression tokenization rules, etc., wordforms, stopwords, synonyms, option to create tokenization plugins, morphology for various languages based on stemmers and lemmatizers.
    • ❌ Meilisearch: tokenizer depends on the language: Unicode segmenter for most languages, specific tokenizers for Chinese, Japanese, Hebrew, and Thai. Synonyms. Stopwords.
  • Authentication
    • ✅ Meilisearch: built-in authentication.
    • ❌ Manticore: no built-in authentication.
  • Interfaces
    • ✅ Manticore: SQL-first, you can connect using a MySQL client. HTTP JSON interface. Binary interface for extremely low response times. Clients for: PHP, Python, JavaScript, Java, C#, Elixir, Golang.
    • ❌ Meilisearch: HTTP JSON interface. Clients for: JavaScript, Python, PHP, Java, Ruby, Golang, C#, Rust, Swift, Dart.
  • Use Cases
    • ✅ Manticore: log search, e-commerce platforms, content-rich websites, enterprise applications.
    • ❌ Meilisearch: small-scale projects with limited data and search requirements.

Use Cases

Use cases for Manticore Search

  1. E-commerce platforms: Manticore Search can efficiently manage large product catalogs, providing relevant search results for customers with its advanced faceted functionality. This improves conversion rates and enhances the overall shopping experience, making it a highly sought-after feature for e-commerce platforms.
  2. Content-rich websites: Manticore Search can index and search through extensive content libraries, such as news sites, blogs, or knowledge bases. With proper full-text ranking, it ensures users find the information they need quickly and effectively, contributing to higher user engagement.
  3. Enterprise applications: Manticore Search's scalability and advanced search capabilities make it ideal for large-scale enterprise applications, including customer relationship management (CRM) systems, document management systems, and intranet portals, where accurate and efficient search functionality is critical.
  4. Log search: Manticore Search is great for searching in logs, as it can efficiently handle and search through huge logs. Its speed and performance make it an excellent choice for log analysis and monitoring.

Use cases for Meilisearch

Small-scale projects: Meilisearch's lightweight nature and ease of deployment make it suitable for small projects with limited data and search requirements, such as small-scale e-commerce, personal websites, local directories, or simple web applications, where fast data loading, advanced search features and scalability are not critical factors.

Conclusion

When choosing a search engine for your project, it is crucial to consider factors such as search relevance, scalability, and performance. Manticore Search stands out as the superior choice for diverse applications and use cases, ensuring optimal search performance and relevance regardless of dataset size. Its advanced search and analytics capabilities make it a reliable choice for projects that demand high-performance search functionality.

Meilisearch is suitable for small projects where advanced search features and scalability are not critical factors.

Ultimately, the choice between Manticore Search and Meilisearch will depend on your specific needs and project requirements.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .