Introduction
Combining geospatial data with semantic data can unlock the potential for building powerful applications. Developers can create visually stunning and highly effective applications by leveraging cutting-edge technologies like Qdrant, Llama2, and Streamlit, alongside advanced techniques such as LlamaIndex and LangChain.
The fusion of geospatial data, which provides information about physical locations, with semantic data, which adds meaning and context, opens up a world of possibilities. Imagine easily analyzing and visualizing vast geospatial datasets, then overlaying them with semantic insights to uncover hidden patterns and correlations. This is where the power of Qdrant, a high-performance vector database, comes into play. By efficiently storing and querying embeddings generated by LLMs from Hugging Face, Qdrant enables lightning-fast retrieval of relevant information.
By combining insights from natural language processing (NLP) models like Llama2 with geospatial datasets, developers can create applications that understand both the textual context and the spatial context of the data. This can help with richer and more intelligent visualizations, which allow users to gain deeper insights and make more informed decisions.
In this guide, I’ll show you how to use all these tools to make awesome visualizations.
Why Vector Search and LLMs?
Why do we use Vector Search and LLMs (Large Language Models) in creating powerful visualizations with tools like Llama2, Streamlit, Folium, and Qdrant? Let’s break it down.
Firstly, Vector Search is essential because it helps us find similar items quickly and efficiently. Imagine you have a massive collection of data points spread across a map. With Vector Search, you can locate points that are similar to a given reference point in terms of their attributes or features. This capability is crucial for uncovering patterns, trends, and relationships within geospatial data.
LLMs, such as those developed by Hugging Face, play a vital role in this process. These models are trained on vast amounts of text data and can understand the context and meaning of words, phrases, and sentences. By converting text inputs into high-dimensional embeddings (representations), LLMs enable us to incorporate textual information into our visualizations.
Now, let’s connect the dots. Vector databases like Qdrant efficiently store and retrieve these embeddings, allowing fast and accurate searches. This means we can seamlessly combine the power of Vector Search with the capabilities of LLMs to create visualizations that not only represent geospatial data but also incorporate textual insights.
For example, imagine a map visualization where each point represents a location mentioned in news articles. By using Vector Search and LLMs, we can cluster similar locations together and overlay them with relevant news snippets, providing users with a comprehensive understanding of the geographical distribution of events and topics.
Qdrant: Vector Similarity Search Technology
Let’s talk about Qdrant DB, a powerful tool that makes finding similar items a breeze. Qdrant DB is what we call a “vector database,” which means it’s good at handling data in a way that helps us find things that are similar to each other.
So, what’s the big deal with finding similar things? Well, think about it this way: let’s say you have a bunch of points on a map, each representing a different place. With Qdrant DB, you can quickly find other points on the map that are similar to a given point. This is super useful for all sorts of things like finding locations with similar characteristics or grouping points that belong to the same category.
One of the coolest things about Qdrant DB is its ability to handle high-dimensional data. This means it can work with data that has lots of different attributes or features, making it perfect for tasks like natural language processing (NLP), where we often deal with complex data structures.
But here’s where it gets even better: Qdrant DB isn’t just good at finding similar items — it’s also really fast. This means you can retrieve similar items from your dataset in the blink of an eye, even when dealing with huge amounts of data.
Step-by-Step Guide to Building Campground Search System with LlamaIndex
Building a Campground Search System with LlamaIndex opens up exciting possibilities for finding the perfect outdoor getaway spot. Leveraging Qdrant and LlamaIndex, you can create a seamless and efficient search experience for campers.
Download Campground Data
Before we dive into building our Campground Search System with LlamaIndex, let’s start by downloading the campground data. You can find the dataset at the following link: https://data.world/caroline/campgrounds
This dataset contains valuable information about various campgrounds, including their locations, amenities, and user ratings. Once downloaded, we’ll use this data to create our powerful visualization and search system.
Note: Make sure to save the downloaded dataset in a location accessible to your development environment.
Now, let’s proceed with building our Campground Search System using LlamaIndex and other advanced technologies.
Install Required Libraries
Before implementing the search for LlamaIndex with Qdrant, you’ll need to install several libraries to set up your development environment properly. Follow the steps below to install the necessary dependencies:
Install Python 3.11
To begin, ensure you have Python version 3.11 installed on your system. You can download Python 3.11 from the official website here:https://www.python.org/downloads/release/python-3118/.
Install Qdrant Client
Next, install the Qdrant client library using pip:
pip install qdrant-client
This library allows your Python code to connect with the Qdrant vector database.
Install LlamaIndex
pip install llama-index
LlamaIndex provides functionalities for handling and indexing text data for search purposes.
Install LlamaIndex Vector Stores for Qdrant
pip install llama-index-vector-stores-qdrant
This library enables seamless integration between LlamaIndex and Qdrant, allowing you to index and search vector data efficiently.
Install LlamaIndex Embeddings for Hugging Face
In this guide, using Hugging Face embeddings with LlamaIndex, install the embeddings library:
pip install llama-index-embeddings-huggingface
This library provides support for using pre-trained Hugging Face models for text embedding.
Install LlamaIndex LLMS for LLAMA CPP
For LLAMA CPP integration with LlamaIndex, install the LLMS (Large Language Model) library:
pip install llama-cpp-python
pip install llama-index-llms-llama-cpp
LLMS enables you to leverage the LLAMA CPP model for advanced natural language processing tasks within LlamaIndex.
Once you’ve installed these libraries, you’ll be ready to implement the search functionality for LlamaIndex with Qdrant in your Python environment.
Connect Qdrant to Cluster
Setting Up Qdrant Cloud
To begin using Qdrant Cloud, follow these steps:
Sign Up for Qdrant Cloud
Visit Qdrant cloud, and sign up for an account to access Qdrant Cloud services.
If you are an existing user, you can log in; otherwise, register using a Google account or email.
After login, the dashboard will be as shown below:
Create a Cluster
Follow the given steps to create a cluster:
Enter the name of the cluster you would like to add. For example, we will use ‘MAPP’ and then click on ‘Create Free Tier Cluster’.
Set Up API Key
To access your Qdrant cluster, you’ll need to set up an API key. Follow these steps to obtain and use your API key:
As shown in the figure above, simply click on the ‘API Key’ button to generate an API Key. Then, after generating the API Key (as shown in the figure below), copy it to connect to the Qdrant cluster.
Access Cluster URL
To access the cluster URL, click on the cluster from the dashboard. You will find the cluster URL displayed, as shown in the figure below:
Use the provided below guide to connect Qdrant to your created cluster. Replace the placeholders with your actual cluster URL and API key:
from qdrant_client import QdrantClient
# Connect Qdrant to your created cluster.
client = QdrantClient(
url="YOUR_CLUSTER_URL",
api_key="YOUR_API_KEY"
)
Replace “YOUR_CLUSTER_URL” with the URL of your Qdrant cluster and “YOUR_API_KEY” with your actual API key. This information allows Qdrant to authenticate and establish a connection with your cluster.
Load External Data
To incorporate external data into your application, you can use the SimpleDirectoryReader class from LlamaIndex. Follow these steps to load external data:
Import Necessary Module
Ensure you have imported the required module for using the ‘SimpleDirectoryReader’ class:
from llama_index.core import SimpleDirectoryReader
This module provides functionalities for reading data from external sources.
Load External Data
Use the provided code to load the external data from the specified file (us_campsites.csv in this case):
# Load external data
documents = SimpleDirectoryReader(
input_files=["caroline-campgrounds/data/us_campsites.csv"]
).load_data()
print(documents)
Replace “caroline-campgrounds/data/us_campsites.csv” with the path to your external data file. This code snippet loads the data from the specified file into memory for further processing.
The output displays the loaded data from the external file, including campground details such as longitude, latitude, name, city, code, and more.
By following these steps, you can seamlessly integrate external data into your application using the SimpleDirectoryReader class from LlamaIndex.
Text Parsing into Nodes
Once you’ve loaded the external data, the next step is to parse the text into nodes using the below steps:
Import Necessary Module
Ensure you have imported the required module for using the ‘SentenceSplitter’ and ‘TextNode’ classes:
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.schema import TextNode
Create a Text Parser
First, create a text parser object using the SentenceSplitter class:
# Parse a text in a list
text_parser = SentenceSplitter(
chunk_size=1024,
)
This text parser splits the text into smaller chunks for processing.
Split Text into Chunks
Use the text parser to split the text into chunks:
text_chunks = []
doc_idxs = []
for doc_idx, doc in enumerate(documents):
current_text_chunks = text_parser.split_text(doc.text)
text_chunks.extend(current_text_chunks)
doc_idxs.extend([doc_idx] * len(current_text_chunks))
This code iterates through the documents, splits the text into chunks, and stores them in a list.
Construct Nodes from Chunks
Construct nodes from the text chunks:
# Construct node from the chunks
nodes = []
for idx, text_chunk in enumerate(text_chunks):
node = TextNode(
text=text_chunk,
)
src_doc = documents[doc_idxs[idx]]
node.metadata = src_doc.metadata
nodes.append(node)
This code snippet creates nodes from the text chunks, assigning metadata from the source documents to each node.
By following these steps, you can parse text into nodes for further processing and analysis in your application.
Embed the Node
After parsing the text into nodes, the next step is to embed each node using a pre-trained language model and LLAMA CPP:
Import Necessary Module
Ensure you have imported the required module for using the ‘HuggingFaceEmbedding’, ‘StorageContext’, ‘VectorStoreIndex’, and ‘LlamaCPP’ classes:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.llama_cpp import LlamaCPP
from llama_index.core.storage.storage_context import StorageContext
from llama_index.core import VectorStoreIndex
Embed Text Nodes
Use the provided code to embed the text nodes:
# Embed the node
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en")
for node in nodes:
node_embedding = embed_model.get_text_embedding(
node.get_content(metadata_mode="all")
)
node.embedding = node_embedding
This code iterates through each node, extracts the text content, and embeds it using the specified pre-trained language model.
Initialize LLAMA CPP
Initialize LLAMA CPP for further processing:
# LLAMA CPP
model_url = "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q4_0.gguf"
llm = LlamaCPP(
model_url=model_url,
model_path=None,
temperature=0.1,
max_new_tokens=256,
context_window=3900,
generate_kwargs={},
model_kwargs={"n_gpu_layers": 1},
verbose=True,
)
This code initializes LLAMA CPP with the specified model URL and parameters for generating embeddings.
Configure Service Context
Configure the service context with LLAMA CPP and the embedding model:
from llama_index.core import Settings
Settings.llm = llm
Settings.embed_model = embed_model
This step sets up the service context with the initialized LLAMA CPP model and embedding model.
Initialize Vector Store and Index
Initialize the vector store, storage context, and index using the provided code snippet.
vector_store = QdrantVectorStore(client=client, collection_name="MAP")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context)
Add Nodes to Vector Store
Add the embedded nodes to the vector store:
vector_store.add(nodes)
This code adds the embedded nodes to the vector store for efficient storage and retrieval.
By following these steps, you can embed the text nodes and set up the necessary components for further processing and analysis in your application.
Query Retrieve
Once the nodes are embedded and stored, you can perform queries to retrieve the relevant information:
Import Necessary Module
Ensure you have imported the required module for using the ‘VectorStoreQuery’, ‘RetrieverQueryEngine’ classes:
from llama_index.core.vector_stores import VectorStoreQuery
from llama_index.core.query_engine import RetrieverQueryEngine
Set Query Mode
Set the query mode to determine the type of search to perform:
query_mode = "default"
This specifies the default query mode for the search.
Perform Vector Store Query
Perform a query on the vector store to retrieve similar nodes:
vector_store_query = VectorStoreQuery(
query_embedding=query_embedding, similarity_top_k=2
)
query_result = vector_store.query(vector_store_query)
print(query_result.nodes[0].get_content())
This code executes a query on the vector store using the specified query embedding and retrieves the top-k similar nodes.
Retrieve Nodes with Scores
Retrieve the nodes along with their similarity scores:
nodes_with_scores = []
for index, node in enumerate(query_result.nodes):
score: Optional[float] = None
if query_result.similarities is not None:
score = query_result.similarities[index]
nodes_with_scores.append(NodeWithScore(node=node, score=score))
This loop iterates through the query result nodes and their corresponding similarity scores, storing them in a list.
Perform Query Retrieval
Define a function to perform query retrieval:
def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
"""Retrieve."""
query_embedding = embed_model.get_query_embedding(
query_bundle.query_str
)
vector_store_query = VectorStoreQuery(
query_embedding=query_embedding,
similarity_top_k=self._similarity_top_k,
mode=self._query_mode,
)
query_result = vector_store.query(vector_store_query)
nodes_with_scores = []
for index, node in enumerate(query_result.nodes):
score: Optional[float] = None
if query_result.similarities is not None:
score = query_result.similarities[index]
nodes_with_scores.append(NodeWithScore(node=node, score=score))
return nodes_with_scores
This function retrieves nodes based on the provided query and returns them along with their similarity scores.
Initialize Query Engine and Execute Query
Initialize the retriever query engine and execute a query:
retriever = _retrieve()
query_engine = RetrieverQueryEngine.from_args(
retriever, service_context=service_context
)
query_str = "Beeds Lake State Park"
response = query_engine.query(query_str)
print(str(response))
This code initializes the query engine with the retriever function and service context, and then executes a query with the specified query string.
By following these steps, you can effectively query and retrieve relevant information from the stored nodes in your application.
Display Map Using Streamlit and Folium
To visualize the retrieved locations on a map, we utilize Streamlit and Folium libraries. Here’s how we do it:
Install Necessary Module
pip install streamlit-folium
Import Necessary Module
Ensure you have imported the required module for using the ‘streamlit_folium’, and ‘streamlit ’ classes:
import folium
import streamlit as st
from streamlit_folium import st_folium
Define Functions for Search and Map Display
# Function to perform semantic search and retrieve latitude and longitude
def search_city(place_name):
# Perform semantic search using Llama Index
response = query_engine.query(place_name)
if response:
return response.nodes[0].get_content()
# Function to display map with retrieved data
def show_map(latitude, longitude, place_name):
if latitude is not None and longitude is not None:
# Create a folium map centered around the retrieved location
m = folium.Map(location=[latitude, longitude], zoom_start=16)
# Add a marker for the retrieved location
folium.Marker([latitude, longitude], popup=place_name, tooltip=place_name).add_to(m)
# Display the map
st_data = st_folium(m, width=700)
User Input and Retrieval
# User input for city name
place_name = st.text_input("Enter the name of a place")
# Perform semantic search and retrieve latitude and longitude when the user submits the input
if place_name:
matched_city = search_city(place_name)
if matched_city:
latitude, longitude = matched_city["latitude"], matched_city["longitude"]
st.write(f"Retrieved location for {place_name}: Latitude - {latitude}, Longitude - {longitude}")
show_map(latitude, longitude, place_name)
else:
st.write("Place not found")
This setup allows users to input a place name, perform a semantic search, and visualize the retrieved location on an interactive map.
Output:
Conclusion
In this guide, we explored the combined power of Llama2, Streamlit, Folium, and Qdrant to create a campground search system. By leveraging these tools, we were able to harness the capabilities of geospatial datasets and perform advanced vector searches efficiently. We hope you found this technique useful.
The article is originally published on Medium: https://medium.com/@shaikhrayyan123/guide-to-building-a-campground-search-system-with-llama2-streamlit-folium-and-qdrant-2b28b8738306