Memories in ChatGPT: Privacy Implications

Maxim Saplin - Feb 16 - - Dev Community

Memories in ChatGPT: Privacy Implications

A few days ago OpenAI has announced ChatGPT memory feature which allows to extract and keep information from dialogs and then use it in new conversations.

Yet it was this paragraph that caught my attention:

We may use content that you provide to ChatGPT, including memories, to improve our models for everyone. If you’d like, you can turn this off through your Data Controls. As always, we won't train on content from ChatGPT Team and Enterprise customers. Learn more about how we use content to train our models and your choices in our Help Center.

It reminded me of the 2023 controversy around ChatGPT: news and social media headlines, "corporate fear of ChatGPT" banning its use by employees, etc (e.g. Samsung Trade secrets exposed).

It is a good time to recap how ChatGPT privacy works and what is the mechanism of the leak.

Data Controls

Back in April 2023, OpenAI added the option to ChatGPT allowing to disable training models on user conversations. The complete instruction is here. Yet, there's a catch... Disabling training also disables chat history - a very handy feature that most likely will lead to users making a tradeoff in favor of keeping the setting as is. Besides, most users rarely care about tweaking and the default settings. I bet that for the majority of ChatGPT users the model training option is enabled.

As long as you want your chat history enabled and don't want is to be used in training your options are upgrading to ChatGPT Business plan, switching to a different Chat bot platform OR going the API and private UI route.

How can ChatGPT leak data

The curious part about LLMs is that whatever data gets into the training data is latter made available to every other user of this model. OpenAI's GPT 3.5 and GPT4 is the model used by millions at the same time. I.e. there's no intrinsic separation of access levels and filtering out private data.

The best ChatGPT can do is use security filters that can stop the model from answering sketchy questions OR stop it during the generation if red flags are discovered. Those are not fine grain controls but rather generic firewalls against hate, violence, bomb creation recipes kinds of messages. And these security filters are not bulletproof, there're multiple ways of jailbreaking and LLM creators/operators have to make a tradeoff between quality of answers and security (remember all those 'ChatGPT is getting dumber' headlines).

Thus whatever gets into training data and is encoded in model weights (aka as parameters, e.g. GPT 3.5 has 175 billion of those) and becomes part of publicly available chat interface. The data can be queried by any user.

While base model training is typically done over months, costs millions and is conducted using publicly available (scrapped from the internet) or curated datasets... Fine-tuning of base models can be conducted every other week and user conversations are a great data source for the fine-tuning. No surprise that while ChatGPT kept saying that the knowledge cutoff of the model was September 2021 it could produce answers with facts coming from after the date. OpenAI simple fine-tuned their models and released to production making them available via ChatGPT.

OK. Now the process is clear. Users talk to ChatGPT, they agree to the terms and allow the conversations to be used in training. In their dialogs, there can be private information or facts that latter gets into the model. The info becomes part publicly available database - in a form of ChatGPT chatbot or OpenAI hosted models available via APIs.

The leak can happen randomly, just be a result of someone asking around OR a model can catching a glitch and spilling out some of the training data untouched. In this case, the fact can be left unnoticed and don't have any consequences.

The leak can be a result of bad intentions. Someone might try to ask about Samsung trade secrets and get some specs on their unannounced hardware (assuming Samsung employee had shared those facts in a chat and this data ended up in fine-tuning dataset). Or try a divergence attack try to brute force and mine as much raw data as possible:

Extracting pre-training data from ChatGPT

That's it for today. Take care!

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .