Semantic Search with KNN and LLM - Explained with a Poem

Mateusz Charytoniuk - Jan 25 - - Dev Community

Creating a simple semantic search engine is possible by creating embeddings with an LLM model and then searching through them with the KNN algorithm.

The entire process is surprisingly simple.

Running LLM Locally

You can use tools like Ollama or llama.cpp.

Let's say we have a Mistral LLM running on Ollama.

Preparing Your Data

Input Data

You need to have some data to search through it. Let's assume you have a PDF document or something similar. We will use a funny poem as an example:

Daddy Fell into the Pond
By Alfred Noyes

Everyone grumbled. The sky was grey.
We had nothing to do and nothing to say.
We were nearing the end of a dismal day,
And then there seemed to be nothing beyond,
Then Daddy fell into the pond!

And everyone's face grew merry and bright,
And Timothy danced for sheer delight.
"Give me the camera, quick, oh quick!
He's crawling out of the duckweed!" Click!

Then the gardener suddenly slapped his knee,
And doubled up, shaking silently,
And the ducks all quacked as if they were daft,
And it sounded as if the old drake laughed.
Oh, there wasn't a thing that didn't respond
When Daddy Fell into the pond!

First, you need to split it into chunks (that's what that is called) - a piece of text of arbitrary size (you need to figure out the best length for a chunk - it's a part of your research). It can be an entire page or just a paragraph.

Generating Embeddings

We will use Ollama's embeddings endpoint:

POST /api/embeddings
Enter fullscreen mode Exit fullscreen mode

Let's use one stanza per embedding to have three of them. That is very simplified just for the sake of the example; usually, you will have thousands or millions of embeddings to deal with (or more).

After we send our text's chunk to Ollama, it will forward that to an underlying LLM, and that LLM will generate an embedding for us. You should have three vectors:

Everyone grumbled. The sky (...) -> [number, number, ...]
And everyone's face grew merry (...) -> [number, number, ...]
Then the gardener suddenly (...) -> [number, number, ...]
Enter fullscreen mode Exit fullscreen mode

What is an Embedding?

You can think of embedding as a set of coordinates. If we want to place something in a 3D world, we use the [x,y,z] vector to point to that item.

The spatial system for words is much more complicated; embeddings can have 4096 or more such coordinates. The closer the embeddings are to each other, the more semantically similar they are.

Semantic Search

So, how do we search for similar phrases? Simply put, the closer the embeddings are to each other (in a spatial sense, by distance), the more similar they are.

After the user enters an input prompt, convert it to another embedding; for example, if the user asks, Who fell into the pond? - that will also be converted into a vector. Then, you can use KNN (K Nearest Neighbor algorithm) to find K closest embeddings to that vector. Those are the most likely related embeddings to your query.

Notice that you don't need extensive use of an LLM to perform such a search.

Summary

Where to go from there? Try setting up such semantic search with your favorite LLM and framework, and experiment with different chunk sizes to see which produces the best results. Good luck!

. . . . . . . . . . . . . . . . . . . . . . . .