This is a conversational RAG app where all the RAG pipelines are entirely built in PostgreSQL procedures using PL/pgSQL!
The idea behind this app stems from my master's thesis work. I have to do systematic literature review and doing it manually is boring. So, I created this small app so I can just upload the full text paper and chat with it, create summaries, highlights, and key results. Massively streamlining the process of systematic literature review.
Of course, this app would work with any kind of data, we just need to change the system prompt a bit!😏
Key Features:
Summarize research papers (journal articles, conference papers, etc.)
Create highlights/key insights
Automatic processing using pgai Vectorizer
Chat with independent paper
Save multiple chat sessions
Initially I want to fully use Ollama, but pgai Vectorizer currently do not support Ollama, so I opted to use Open AI.
KawanPaper is your go-to app for chatting mainly with research papers (journal articles, conference papers, etc.)
Features:
PDF upload and automatic parsing
Generate key insights from research papers
Chat with a specific paper
Setup
Make sure you have an up to date Docker instalation and then clone this repo. We will divide the installation process into 3 parts, minio setup, database migration, and launching the app.
Configuration
Main configuration: copy the .env.example file to .env
Docker compose configuration: copy the docker.env.example to docker.env
These config have a predefined values to make it easier to deploy. Note there are some env vars that we need to define:
.env
VITE_MINIO_ACCESS_KEY
VITE_MINIO_SECRET_KEY
docker.env
OPENAI_API_KEY
You can add your Open AI key in the docker.env and for the minio credentials, we will create one in the next step.
Minio Setup
This is a new thing for me, back in the day we can…
I never thought I would be writing LLM chain/pipeline using SQL instead of Haystack, LangChain, or LlamaIndex, but here we are!
It's crazy what pgai could bring in the future for LLM in databases.
Final Thoughts
This has been an interesting journey because the idea of running LLM directly in database is really weird at first. But after learning it for the last 2 days, I found it really interesting and could possibly revolutionize data mining pipelines for non-AI engineers. I imagine data analysts and researchers could easily get insights from database systems without major changes to existing systems.
One of my favorite experiences in this project is I learned how to write Postgres procedures and functions using PL/pgSQL. It was a really interesting journey especially to write LLM apps that used to be written using LangChain, Haystack, or LammaIndex now I implemented it using pure PL/pgSQL to build a conversational RAG.