Ever wished for a chat with AI that's all yours, away from prying eyes? That's where private chat UIs come into play. They offer a slice of the digital realm where you can enjoy conversations with AI on your terms. Let's chat about why you might want to go private and how py_chat_ui might be a good option out of many out there.
ChatGPT Controversy and Privacy Concerns
In early 2023, ChatGPT made news and faced scrutiny when sensitive data was exposed through its public platform, leading to a ban in enterprises and even in entire countries.
Working at a large software development company I remember the waves each ChatGPT leak story made. At around March "don't use ChatGPT at work, never" was a top headline and subject in all kinds of communication and policies.
Yet, how do you benefit from GenAI and LLMs for any work-related matters? In April there were 2 major options:
- Using open-source/self-hosted models (Vicunas, Alpacas, and other kinds of Llamas of the time)
- OpenAI web services and OpenAI models hosted by Microsoft on Azure
The 1st was (and still is) very expensive (compute and RAM requirements for any capable model are sky-high) and the models are inferior to private models. Open-source models are only now getting close to GPT-3.5 on different kinds of benchmarks (none has touched the performance of GPT-4) and have very small context windows (typically 2-4k tokens vs 16k in GPT-3.5 and 128k in GPT-4).
The 2nd made more sense as it offered access to state-of-the-art models and (unlike the free tier of ChatGPT) there were promises in regards to data privacy. Both OpenAI and Azure state they don't use prompts sent to APIs for training or improving models, the data you put into models can't leak into base models:
- Azure privacy statement says that:
- Customer data processed by Azure OpenAI is not sent to OpenAI
- And not used to train the OpenAI models
- It is possible to opt out of the logging and human review process
- OpenAI data usage policies say that no data submitted via API is used for training or improving their models
Chat UI
Both of the above options require some UI, be it a web app or a mobile/desktop client. And there're plenty of options. If you google something like "GitHub chat ui list" you might end up with smth like this - there're quite a few lists like this on the internet.
Most of the options are web apps that can be self-hosted, 99% of them can work with OpenAI's Chat Completions API and the minimal config required is OpenAI API key (which you can typically set via Environment variables) - clone the project locally, set the API key, hit F5 and you are good to chat with GPT with a web app running at localhost
:
Some of the options support OpenAI models hosted on Azure, others can integrate with Google Vertex or Amazon Bedrock.
My first experience with 'private chat UI' was a Next.js clone of the original ChatGPT interface called 'chatbot-ui'. It was popular (~20k stars in May), had a familiar interface, supported Azure endpoints out of the box, and was available through a public container registry (Docker Hub).
I had spare monthly credit for Azure consumption and could use GPT 3.5 API and serverless hosting without extra cost.
The Appeal of Going Private
Private chat UIs are about more than just keeping secrets or solving enterprise security concerns — they're about crafting an experience that's uniquely yours. Here's why you might consider one:
- Complete control: you can pick out of multiple models, define the system message, determine hyperparameters (e.g. top_p, temperature), be sure that your entire conversation is given to the API (and there're no context compression tricks behind the scenes)
- Different UI/UX: find something more appealing to your taste, looks or feels better
- Cost Savings: If you've got Azure credits or other cloud perks, a private UI can be a wallet-friendly choice.
- Speedy Responses: inference through API may give you more tokens per second vs any public chatbot services (ChatGPT included)
- Self-hosted Models: Got a new open-source model with an inference server supporting OpenAI? Hook it up to your private UI and start chatting.
- Custom tools: you can have a Python interpreter automatically executing code snippets produced by the LLM or calling some LangChain tool or custom API you might want to integrate
Building Own Chat UI, with Blackjack and ...
The 'chatbot-ui' didn't stick for long. The project was abandoned in Spring 2023 and stayed without updates for the whole year. It lagged behind OpenAI API versions and had some naughty typing bugs on my Android phone...
At some point, I went back to those awesome
lists of different chat UIs yet none was OK for me and... I decided to build my own 'perfect' chatbot UI :)
I just needed a bare minimum of features and the purest experience For some reason I had a hard time ticking all the boxes when reviewing other options, here's the list of requirements I had:
- Web app, minimal clutter-free UI
- Out-of-the-box support of Azure endpoints (no need to set up gateways etc.)
- Can be configured via ENV vars or the UI
- Available via a public container registry and can be deployed on any serverless hosting in minutes
- Allows overriding system message and defining temperature
- Keeping chat history server side in encrypted form
- Can be spun up locally and pointed to an arbitrary endpoint via a single command like
docker pull ghcr.io/maxim-saplin/py_chat_ui:latest && docker run -p 8501:8501 -e DISABLE_AUTH=True -e API_TYPE=OpenAI -e OPENAI_API_BASE=https://mixtral-api.test.com/v1 -e MODEL=mistralai/Mixtral-8x7B-Instruct-v0.1 -e OPENAI_API_KEY=EMPTY ghcr.io/maxim-saplin/py_chat_ui:latest
- And it will have a token counter.
The Killer Feature: Token Counter
It's like having a fuel gauge for your conversation, letting you know just how much 'chat fuel' you've got before you need to refuel (or, in this case, before you hit your token limit). That's what the token counter in py_chat_ui
is like.
Let's take a detour and talk about context windows. OpenAI's GPT-4 Turbo recently flexed its muscles with a 128K context window. That's enough space to hold a small library of tweets or a cozy gathering of StackOverflow questions. Even with a context window that feels like it stretches to the horizon, it's not infinite. It is still not enough to hold an average book or even source HTML of Google Results page.
There's evidence that the advertised context window size often is not fully available without compromising LLM performance. E.g. GPT-4 Turbo faces recall challenges with context going above 71K tokens (~55% of max threshold).
In-context learning and RAG (putting all relevant data inside a prompt) seem to be the major way of making use of own data when using LLM. Understanding the size of the conversation, of the text snippet you are about to send isn't just a curious fact, it is an important metric of the threshold proximity, when LLM will become blind to information in the prompt.
I'm Happy with the Result
Here's the live demo of the app. I am using py_chat_ui
daily, on desktop and mobile (have a shortcut on the Home Screen).
With ~100 dialogs per month I get ~$5 bill for GPT-4 Turbo model and $14 for Azure Container App hosting the web app (here's an instruction on how to deploy it to Azure). FYI, Visual Studio Professional and Enterprise subscriptions have ~$50/$150 monthly credit towards Azure Consumption. You can have both the model and the UI hosted there.
P.S> Built with Python and Streamlit
I have seen how people used Snowflake's Streamlit UI framework to put up a proof-of-concept app integrating LLMs at various hackathons. I have contributed to one of such pilot projects by myself. And I was impressed by how stupid simple yet effective Python + Streamlit is.
There's a nice blog post demonstrating how one can build a chat interface in just 43 lines of code.
In a nutshell, Streamlit combines Python back-end with React front-end. The client communicates to the server via sockets using Protobuf for serialization. Every user action reruns the entire server script (top to bottom). And by the way, it makes the re-runs quite fast - i.e. even on the free tier Streamlit Cloud the demo app of py_chat_ui
shows the average execution time of the script to be between 40-50ms.
Under the hood the framework determines which parts of the UI require re-rendering and the front-end makes the changes to affected elements. It kind of reminds of Virtual DOM in React, yet Streamlit is very basic, there's no room for any reactive features or complex state management features.
The framework is also very limiting in its capabilities. I.e. UI tweaks are only possible via overriding just a few properties in provided themes (and you can't override both Dark and Light themes at a time). If some behaviour is not available through standard widgets or 3rd party components - you have 2 paths:
- Create own component implementing both Python and React part
- Use tricks embedding JS or HTML into the page (and fighting with element navigation and React managing the state).
For py_chat_ui
Streamlit was enough and saved quite some time. It took me literally one weekend to build 90% of the app. And Cursor.sh IDE was my AI coding assistant writing 90% of the code (using my own GPT-4 Turbo endpoint from Azure).
However, maintaining the code turned out to be a challenge. My numerous workarounds tweaking CSS/HTML and embedding custom JS broke quite a few things :)
def show_stop_generate_chat_input_js():
js = f"""
<script>
console.log("show_generate_chat_input_js");
const stopButton = window.parent.document.querySelector('{stop_button_selector}');
if (stopButton) stopButton.style.visibility = 'hidden';
const chatInputContainer = window.parent.document.querySelector('div.stChatInputContainer');
if (chatInputContainer) chatInputContainer.style.visibility = 'visible';
</script>
"""
html(js, 0, 0, False)
Next time if I decide to build something more complex OR deemed to outlive and PoC I would certainly pick a more capable stack. Yet it doesn't mean Streamlit is bad. It is a great tool for prototyping which I like! It delivers upon its promises and makes the right tradeoffs.