LLMs like GPT are trained broadly on the internet. This makes them suspectible to pulling information “out of nowhere”. LangChain is an open sourced way of combining GPT’s writing skills with existing document knowledge into a Q&A bot.
Getting Started
In this repo, there is a demo with a Notion DB as the document source. We’re going to slightly modify the intake source so that we can use this repo with a regular webpage instead of Notion.
After you configure your OPENAI_API_KEY
, do not continue with the export and unzipping of Notion. Instead, locate the html
of a webpage that you’d like to use. We’ll use pandoc
and curl
to pull down a webpage and convert it into markdown, similar to the Notion export. In notion-qa/NotionDB
, run the following command:
curl --silent <html url> | pandoc --from html --to markdown_strict -o <name_of_file>.md
Here’s my example:
curl --silent https://www.coloradocollege.edu/offices/campusactivities/student-organizations-leadership/guidelines-for-starting-a-new-organization.html | pandoc --from html --to markdown_strict -o student_organization_handbook.md
Check to see that your markdown file looks correct. Once it’s there, you can continue on with the repo’s instructions: python ingest.py
. This process takes a few minutes, depending on the file.
Querying the results
Now you can use python qa.py "How do I get something CCSGA certified?"
to ask the bot questions based on the handbook you ingested.
The result should look something like this:
(base) ➜ notion-qa git:(master) ✗ python qa.py "How do I get something CCSGA certified?"
Answer: To get something CCSGA certified, groups must apply to become a recognized CCSGA student organization and be classified as “Active”. The CCSGA Student Life Committee will review the application and external advisors and adult volunteers must follow the guidelines set out in the Student Organization Handbook and complete an HR background check, complete Title IX training, and the Volunteer Agreement Form.
Sources: Notion_DB/student_organization_handbook.md
Cool next steps (if you’re feeling fancy)
From here, the world is your oyster! You could hook up a Twilio configuration so that you can text your bot instead of using the CLI. You could set up an Autocode integration into a Discord bot that allows anyone in your server to query a community knowledge base.
Are you doing something boring, manual, and necessary with your content? Let us know - we’d love to help.