OpenAI api RAG system with Qdrant

Artur Daschevici - Jun 19 - - Dev Community

Why?

OpenAI has been making it easier and easier to build out GPT agents that make use of your own data to improve the generated responses of the pretrained models.

Agents give a way to inject knowledge about your specific proprietary data into your pipeline, without actually sharing any private information about it. You can also improve the recency of your data too which makes you less dependent on the model's training cycle.

OpenAI has improved the DX, UX and APIs since version 3.5, and has made it easier to create agents and embed your data into your custom GPTs. They have lowered the barrier to entry which means that virtually anyone can build their own assistants that would be able to respond to queries about their data. This is perfect for people to experiment on building products. IMO this is a very good approach to enable product discovery for the masses.

Most big AI contenders on the market provide you with a toolbox of high level abstractions and low to no code solutions. The weird thing about my approach to learning things is that not having some understanding of the first principles of the tech I'm using makes me feel a bit helpless, this is why I figured trying to build my own RAG system would be a good way to figure out the nuts and bolts.

What?

I wanted to get a project for running my own pipeline with somewhat interchangeable parts. Models can be swapped around so that you can make the most of the latest models either available on Hugginface, OpenAI or wherever.

Because things are moving so fast in model research the top contenders are surpassing each other every day pretty much. A custom pipeline would allow us to quickly iterate and test out new models as they evolve. This allows you to try out new models and just as easily rollback your experiment.

What I wound up building is a Streamlit app that uses qdrant to index and search data extracted from a collection of pdf document. The app is a simple chat interface where you can ask questions about the data and get responses from a mixture of GPT-4 and the indexed data.

How?

1. Setting up the environment

  • use pyenv to manage python versions
   # update versions
   pyenv update
   # install any python version
   pyenv install 3.12.3 # as of writing this
   # create a virtualenv
   ~/.pyenv/versions/3.12.3/bin/python -m venv .venv
   # and then activate it
   source .venv/bin/activate
Enter fullscreen mode Exit fullscreen mode

2. Install the dependencies

   # install poetry
   pip install poetry
   # install the dependencies
   poetry install
Enter fullscreen mode Exit fullscreen mode

the dependencies section of the pyproject.toml file should look like this:

   ...
   [tool.poetry.dependencies]
    python = "^3.12"
    streamlit = "^1.32.1"
    langchain = "^0.1.12"
    python-dotenv = "^1.0.1"
    qdrant-client = "^1.8.0"
    openai = "^1.13.3"
    huggingface-hub = "^0.21.4"
    pydantic-settings = "^2.2.1"
    pydantic = "^2.6.4"
    pypdf2 = "^3.0.1"
    langchain-community = "^0.0.28"
    langchain-core = "^0.1.31"
    langchain-openai = "^0.0.8"
    instructorembedding = "^1.0.1"
    sentence-transformers = "2.2.2"
   ...
Enter fullscreen mode Exit fullscreen mode

3. Set up the loading of the variables from a config file

  • a nice way to manage settings is to use pydantic and pydantic-settings
   from pydantic import Field, SecretStr
   from pydantic_settings import BaseSettings, SettingsConfigDict

   class Settings(BaseSettings):
       model_config = SettingsConfigDict(env_file="config.env", env_file_encoding="utf-8")
       hf_access_token: SecretStr = Field(alias="HUGGINGFACEHUB_API_TOKEN")
       openai_api_key: SecretStr = Field(alias="OPENAI_API_KEY")

Enter fullscreen mode Exit fullscreen mode

this way you can load the settings from config.env but variables in the environment override the ones in the file.

  • a nice extra is that you also get type checking and validation from pydantic including SecretStr types for sensitive data.

4. Set up the UI elements

  • Streamlit makes it quite easy to strap together a layout for your app. You have a single script that can run via the streamlit binary:
   streamlit run app.py
Enter fullscreen mode Exit fullscreen mode

The gallery has many examples of various integrations and components that you can use to build your app. You have smaller components like inputs and buttons but also more complex UI tables, charts, you even have ChatGPT style templates.

For our chat interface we require very few elements. Generally to create them you only need to use streamlit to initialize the UI.

   import streamlit as st
   ...
   def main():
       st.title("ChatGPT-4 Replica")
       st.write("Ask me anything about the data")
       question = st.text_input("Ask me anything")
       if st.button("Ask"):
           st.write("I'm thinking...")
           response = get_response(question)
           st.write(response)
   ...
   main()
Enter fullscreen mode Exit fullscreen mode

The one thing I find a bit awkward is the fact that if you have elements that need to be conditionally displayed the conditions tend to resemble the javascript pyramid of doom if you have too many conditionals in the same block.

Below is a simple example so you can see what I mean:

   if len(pdf_docs) == 0:
       st.info("Please upload some PDFs to start chatting.")
   else:
       with st.sidebar:
           if st.button("Process"):
               with st.spinner("Processing..."):
                   # get raw content from pdf
                   raw_text = get_text_from_pdf(pdf_docs)
                   text_chunks = get_text_chunks(raw_text)

                   if "vector_store" not in st.session_state:
                       start = time.time()
                       st.session_state.vector_store = get_vector_store(text_chunks)
                       end = time.time()
                       # create vector store for each chunk
                       st.write(f"Time taken to create vector store: {end - start}")
Enter fullscreen mode Exit fullscreen mode

This makes me think that it is probably not designed for complex UIs but rather for quick prototyping and simple interfaces.

5. pdf data extraction

  • I used the PyPDF2 library to extract the text from the pdfs. The library is quite simple to use and you can extract the text from a pdf file with a few lines of code.
   import PyPDF2

   def get_text_from_pdf(pdf_docs):
       raw_text = ""
       for pdf in pdf_docs:
           pdf_file = pdf["file"]
           pdf_reader = PyPDF2.PdfFileReader(pdf_file)
           for page_num in range(pdf_reader.numPages):
               page = pdf_reader.getPage(page_num)
               raw_text += page.extract_text()
       return raw_text
Enter fullscreen mode Exit fullscreen mode
  • The extracted text should be chunked into smaller pieces that can be used to create embeddings for the qdrant index.
   def get_text_chunks(raw_text):
       text_chunks = []
       for i in range(0, len(raw_text), 1000):
           text_chunks.append(raw_text[i:i + 1000])
       return text_chunks
Enter fullscreen mode Exit fullscreen mode

6. Setting up the qdrant server via docker

The best way to set up qdrant is to use docker and to keep track of the environment setup docker-compose is a nice approach. You can set up the qdrant server with a simple docker-compose.yml file like the one below:

   version: '3.9'

   services:
     qdrant:
       image: qdrant/qdrant:latest
       ports:
         - "6333:6333" # Expose Qdrant on port 6333 of the host
       volumes:
         - qdrant_data:/qdrant/data # Persistent storage for Qdrant data
       environment:
         RUST_LOG: "info" # Set logging level to info

   volumes:
     qdrant_data:
       name: qdrant_data
Enter fullscreen mode Exit fullscreen mode

7. Indexing the data

  • The qdrant client can be used to index the embeddings and perform similarity search on the data. You can pick and choose the best model for embeddings for your data and swap them out if you find a better one.
   def get_vector_store(text_chunks, qdrant_url="http://localhost:6333"):
       embeddings = HuggingFaceInstructEmbeddings(model_name="avsolatorio/GIST-Embedding-v0", model_kwargs={"device": "mps"})
       vector_store = Qdrant.from_documents(
           text_chunks,
           embeddings,
           url=qdrant_url,
           collection_name="pdfs",
           force_recreate=True,
       )
       return vector_store
Enter fullscreen mode Exit fullscreen mode

8. sending the query

In order to send the query to qdrant you again need to embed it to allow to do a similarity search over your collection of documents.

   def get_response(question, qdrant_url="http://localhost:6333"):
        embeddings = HuggingFaceInstructEmbeddings(model_name="avsolatorio/GIST-Embedding-v0", model_kwargs={"device": "mps"})
        query_vector = embeddings.encode(question)
        vector_store = Qdrant(url=qdrant_url, collection_name="pdfs")
        response = vector_store.search(query_vector, top_k=1)
        return response
Enter fullscreen mode Exit fullscreen mode

9. Analysis

You can swap out any of the components in this project with something else. You could use Faiss instead of qdrant, you could use OpenAI models for everything(embeddings/chat completion) or you could use open models.

You can forego the UI and simply use fastapi to create an API to interact with the PDF documents. I hope this gives you some sense of the possibilities that are available to you when building your own RAG system.

Conclusions

  • you can build your own agent and have it respond to queries about your data quite easily
  • streamlit is great for prototyping and building out simple interfaces
  • qdrant is good for performing similarity search on your data
  • when building RAG systems you need to make use of embedding models to encode your data
  • embedding models are the most taxing parts of the pipeline
  • if you have pluggable parts in your pipeline you can swap them out easily to save costs
  • pydantic and pydantic-settings are great for adding type checking and validation to your python code
. . . . . . . . . . . .