Manticore Search 6.3.0

Sergey Nikolaev - May 31 - - Dev Community

We're excited to announce the release of Manticore Search 6.3.0! This version brings a host of enhancements, new features, and updates, making your search engine even more powerful and user-friendly.

Vector Search

  • Float vector data type: We've introduced the float_vector data type, which allows you to store and query floating-point number arrays. This is particularly useful for applications that need to perform similarity searches using vector search.
  • Vector search capability: Coupled with the new data type, the vector search feature enables you to execute k-nearest neighbor (KNN) vector searches. This is ideal for building more intuitive and responsive search functionalities in apps. Read more in the blog post Vector Search in Manticore.

Image description

JOIN (beta)

The addition of JOIN capabilities in Manticore Search although still in beta, represents a significant enhancement to the way users can perform queries and manage data relationships. Read more in the documentation.

Example:

SELECT * FROM purchases AS p LEFT JOIN articles AS a ON a.id = p.article_id:
+------+------------+-------------+------+-------+-------------+
| id   | article_id | customer_id | id   | title | @right_null |
+------+------------+-------------+------+-------+-------------+
|    1 |          1 |          10 |    1 | book  |           0 |
|    2 |          1 |          11 |    1 | book  |           0 |
|    3 |          3 |          10 |    0 |       |           1 |
+------+------------+-------------+------+-------+-------------+
Enter fullscreen mode Exit fullscreen mode

REGEX

The new REGEX operator significantly improves how you can search for complex text patterns. This feature is especially important in areas that need very accurate search results, such as analyzing patents, reviewing contracts, and searching for trademarks.

For instance, in data analytics, the REGEX operator can help find specific error codes or programming patterns in log files or code. In academic research, it makes it easier to find articles that use certain citation styles. For trademark searches, this tool is excellent for spotting trademarks that are exactly the same or very similar. This enhancement makes Manticore Search much more powerful and precise for handling detailed and complex searches.

Read more in the blogpost:

Image description

Example:

SELECT * FROM brands WHERE MATCH('"REGEX(/(c|sea).*crest/) REGEX(/flo(we|u)r/)"')
+---------------------+-----------------+
| id                  | name            |
+---------------------+-----------------+
| 1515699435999330620 | SeaCrest Flower |
| 1515699435999330621 | C-Crest Flour   |
| 1515699435999330622 | CCrest Flower   |
+---------------------+-----------------+
Enter fullscreen mode Exit fullscreen mode

Range() and histogram()

The new RANGE function enhances aggregation, faceting, and grouping by categorizing values into specified intervals. These intervals are defined using range_from and range_to, which determine the boundaries within which values fall. This functionality allows for effective sorting and analysis of data based on user-defined ranges.

Example:

 select * from test;
+---------------------+-----------+-------+
| id                  | data      | value |
+---------------------+-----------+-------+
| 8217240980223426563 | Product 1 |    12 |
| 8217240980223426564 | Product 2 |    15 |
| 8217240980223426565 | Product 3 |    23 |
| 8217240980223426566 | Product 4 |     3 |
+---------------------+-----------+-------+

SELECT COUNT(*), RANGE(value, {range_to=10},{range_from=10,range_to=25},{range_from=25}) price_range FROM test GROUP BY price_range ORDER BY price_range ASC;
+----------+-------------+
| count(*) | price_range |
+----------+-------------+
|        1 |           0 |
|        3 |           1 |
+----------+-------------+
Enter fullscreen mode Exit fullscreen mode

The HISTOGRAM() function in Manticore Search categorizes data into buckets based on a specified bucket size. It returns the bucket number for each value, using hist_interval and hist_offset parameters to determine the appropriate bucket. The function calculates the bucket key by measuring the distance from the starting point of the bucket, adjusted by the interval size. This feature is especially useful for creating histograms, which group data into specific value ranges for easier analysis and visualization.

Example:

select count(*), histogram (value, {hist_interval=10}) as price_range from test GROUP BY price_range ORDER BY price_range ASC;
+----------+-------------+
| count(*) | price_range |
+----------+-------------+
|        1 |           0 |
|        2 |          10 |
|        1 |          20 |
+----------+-------------+
Enter fullscreen mode Exit fullscreen mode

There are also date_range and date_histogram for similar aggregations with date/time data.

New commands to simplify data updates and schema management

Replication-related changes

Significant changes have been made in the replication area to improve the process of data transmission between nodes. Replication error when transferring large files has been fixed, a mechanism for retrying command execution has been added, and network management during replication has been improved. Issues with blocking during replication and attribute updates have also been resolved, and the functionality of skipping replication update commands has been added for nodes joining the cluster. All these changes allow for increased efficiency and reliability of the replication process in various usage scenarios.

For detailed information about the changes, see here.

License change and performance optimizations

We've changed the Manticore Search license to GPLv3-or-later. This new license offers better legal safety for users and works better with other open-source licenses. This change shows our dedication to meeting the needs of the community and keeping open-source software strong. In version 6.3.0, we added the Apache 2 licensed CCTZ library, which makes date/time functions much faster. Look at the improvement:

Before:

mysql> select count(*),year(time_local) y, month(time_local) m from logs10m where y>2010 and m<5;
+----------+------+------+
| count(*) | y    | m    |
+----------+------+------+
| 10365132 | 2019 |    1 |
+----------+------+------+
1 row in set (8.26 sec)
Enter fullscreen mode Exit fullscreen mode

Now:

mysql> select count(*),year(time_local) y, month(time_local) m from logs10m where y>2010 and m<5;
+----------+------+------+
| count(*) | y    | m    |
+----------+------+------+
| 10365132 | 2019 |    1 |
+----------+------+------+
1 row in set (0.11 sec)
Enter fullscreen mode Exit fullscreen mode

The query is now 75 times faster.

We have also improved how tables are compacted. Previously, when merging disk chunks, Manticore removed deleted documents from any chunks that had them, using a lot of resources. We have stopped using this method. Now, merging chunks is managed only by the progressive_merge setting, which makes the process simpler and less heavy on resources.

Ubuntu Noble 24.04

Ubuntu Noble 24.04 is now supported.

Image description

And many more

The updates highlighted above are just a part of the many improvements included in Manticore 6.3.0. Please read about:

🚀 9 major changes
✅ 50+ minor changes
🐞 120+ bug fixes

in the changelog.

We hope you enjoy the new features and improvements in Manticore Search. We welcome your feedback and encourage you to engage with us by:

  • Starting a discussion on our Community Forum
  • Reporting bugs or suggesting new features on GitHub
  • Joining the conversation in our Public Slack Chat
  • Emailing us directly at contact@manticoresearch.com
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .