If you’ve used ChatGPT or similar services, you know it’s a flexible chatbot that can help with tasks like writing emails, creating marketing strategies, and debugging code. However, it falls short when handling questions specific to certain domains or your company’s internal knowledge base.
Generic large language models (LLMs) can't address issues unique to you or your company's proprietary data because they're trained on publicly available information, not your custom data.
To build a model with domain-specific knowledge, you need to fine-tune it with your dataset. This brings several advantages: better accuracy, enhanced personalization, and greater control over sensitive data.
In this guide, you will learn how to fine-tune LLMs with proprietary data using Lamini. You will also learn to effortlessly deploy such a system using KitOps.
Steps to fine-tuning your LLM with proprietary data
LLMs are models designed to understand human language and provide sensible output. ChatGPT and Gemini are two common examples. These models are considered generic and are best suited for answering general questions. For instance, they can help you answer generic questions about world history and literature; however, if you ask them a question specific to your company, like “Who is responsible for project X within my company?”, the model will fail to answer correctly.
Since LLMs are powerful models, they can be re-trained with custom datasets to instill knowledge about a specific entity. This is called fine-tuning.
You can fine-tune LLMs using a private server or your computer, but tuning with Lamini is far more convenient. Lamini is an LLM platform that seamlessly integrates every step of the model refinement and deployment process, making model selection, model tuning, and inference usage incredibly straightforward. It also provides detailed logs and a UI to easily try out your fine-tuned LLM.
Let’s use Lamini and a custom dataset to fine-tune an LLM.
Fine-tuning LLM with Lamini
First, you’ll need to create an account with Lamini and generate an API key. You can then install the Lamini Python package using:
pip install lamini
Once the installation is complete, you will need to create a dataset. You can either store the dataset as a csv or json file. For convenience, you can download and use this dataset containing some quiz questions and answers.
Now, you will need to write a Python script that uses the custom dataset to fine-tune a Meta-Llama-3.1-8B-Instruct model. You can use any model, but Llama-3.1-8B-Instruct is light and available on the Lamini platform. It will be quick to download and fine-tune. The Python script, tuning.py
, should contain the following code:
# code/tuning.py
from lamini import Lamini
llm = Lamini(
model_name="meta-llama/Meta-Llama-3.1-8B-Instruct",
api_key="<YOUR_API_KEY>",
)
dataset_id = llm.upload_file("data/dataset.csv", input_key="user", output_key="answer")
llm.tune(data_or_dataset_id=dataset_id)
Note: Ideally, you will want to set your API key as an environment variable and load it within the code in your Python file. Never expose your API key to external users.
The training process takes some time; once complete, you can view the evaluation results, logs, metrics, etc., in the Lamini tuning dashboard. You can also use Lamini’s playground, as shown below, to chat with the tuned model.
Each fine-tuning job results in an updated model with a unique ID. After the tuning process is complete, you can view the Model ID of the trained model; you will use it during the inference.
Finally, to use the fine-tuned model, create a new script test.py
, and add the following code:
# code/test.py
from lamini import Lamini
def get_answer(question, model_id):
prompt = "<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n"
prompt += question
prompt += "<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
llm = Lamini(
model_name=model_id,
api_key="<TUNED_MODEL_ID_HERE>",
)
return llm.generate(prompt, output_type={"Response": "str"})["Response"]
question = "What is the molecular formula of water"
model_id = "c3ed7866509951fd6a749ea018daaeb32a5f051b576c1d44a72775f85a8090bc"
print(get_answer(question, model_id))
Note that before sending the user’s question to the LLM, you need to structure your prompt as required by the specific model, in this case, the Meta-Llama-3.1-8B-Instruct model. The prompt template for the model looks like this:
<|begin_of_text|><|start_header_id|>system<|end_header_id|> # optional
{{ system_prompt }} # optional
<|eot_id|><|start_header_id|>user<|end_header_id|>
{{ user_message }}
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
The list below defines the meaning of special tokens in the prompt:
-
<|begin_of_text|>
: Specifies the start of the prompt -
<|end_of_text|>
: Model will cease to generate more tokens. This token is generated only by the base models. -
<|eot_id|>
: End of turn. -
<|start_header_id|>
and<|end_header_id|>
: These tokens enclose the role for a particular message. The possible roles are:[system, user, assistant and ipython]
The first three lines of the get_answer
method in the test.py
file define this structure (without the system prompt). For a new model, say mattshumer/Reflection-Llama-3.1-70B, you will need to look at its model card in the Hugging Face and find its prompt template.
The lines after the prompt template initialize the LLM using the Lamini
class, API Key, and Model ID, and generate a response using the generate()
method from the Lamini
class.
You can use the get_answer
method to invoke the trained model. You can also use it to create API using FastAPI or Flask.
At this point, your directory structure should look like the one shown in the snippet below:
.
├── code
│ ├── test.py
│ └── tuning.py
└── data
└── dataset.csv
Now that you have a tuned model, you will need to deploy it.
Deploying the fine-tuned model
There are many options for deploying your model. You can use managed services such as Sagemaker, MLFlow, Weights and Biases, etc. You could also create a separate pipeline to deploy your models. Make sure that whichever tool you use to deploy your model is compatible with other open source tools and protects user data.
- Compatibility
Machine learning engineers and data scientists use numerous tools (libraries and frameworks) to experiment, train, and deploy machine learning models. Hence, it is important that the tool used for deployment supports a variety of tools used for training and experimenting with machine learning models.
- Data security
It is essential that the data stored within the organization isn’t leaked to external users. The leakage of personally identifiable information is a huge issue in industries such as healthcare and insurance. Such leakage can tarnish a company’s reputation and make it difficult for users to trust the organization.
KitOps, an open source tool, packages its artifacts (code, data, and model) as a ModelKit, which stores assets as an open container initiative (OCI)-compatible artifacts. This makes them compatible with nearly every development and deployment tool in use today. Some commonly used tools include Git, Docker, AirFlow, Data version control (DvC), Cloud services (AWS, Azure, Google Cloud), etc.
Furthermore, KitOps makes it easy to link a trained model with the data it was trained on. It packages the contents in a ModelKit so that all the artifacts (model, code, dataset, etc.) added to it are tamper-proof, and anyone can quickly and easily verify when something may have changed. This enhances the security of the associated data.
Let’s use KitOps to deploy our fine-tuned LLM.
- Install Kit and login to an OCI registry.
# Login to ghcr.io
kit login ghcr.io -u github_user -p personal_token
# Login to docker
kit login docker.io --password-stdin -u docker_user
Freeze the external requirements using
pip freeze > code/requirements.txt
Write a
[Kitfile](https://kitops.ml/docs/kitfile/kf-overview.html)
to package your code and data.
# Kitfile
manifestVersion: 1.0
package:
authors:
- Jozu
description: Deploying LLM using Lamini and KitOps
license: Apache-2.0
name: tunedLLM
datasets:
- description: Code for fine-tuning and inference using Llama 3 8B model
name: training data
path: ./data
code:
- description: Code for fine-tuning
path: ./code
Package the contents into a ModelKit using:
kit pack . -t YOUR_CONTAINER_REGISTRY_URL/APP_NAME:TAG
Push the ModelKit to your registry using
kit push
:
kit push YOUR_CONTAINER_REGISTRY_URL/APP_NAME:TAG
Unpack the code from your remote registry to your deployment server and deploy it.
# Unpack only the code
kit unpack YOUR_CONTAINER_REGISTRY_URL/APP_NAME:TAG --code
# install requirements
pip install -r code/requirements.txt
# try it out
python code/test.py
```
You can also use the function `get_answer(question, model_id)` by importing the function to the respective Python file, which is a part of your API. If you correctly pass a string to the question variable and provide the correct model_id, you will get a response from the fine-tuned model in the Lamini platform.
![Deployed fine-tuned LLM](https://paper-attachments.dropboxusercontent.com/s_09FC299EC6A33A24C103E8D006DB68E9CBCDEAC29AD97C592047EC0A79E25EA2_1727040315819_gif.gif)
## Conclusion
By fine-tuning LLMs using proprietary data, businesses can create AI solutions that are more personalized, precise, and secure. Platforms like Lamini ease the fine-tuning process, while tools such as KitOps help maintain security, compliance, and compatibility across different platforms.
You can learn more about KitOps and ModelKit by exploring our [blog](https://kitops.ml/blog.html); they also have comprehensive [documentation](https://kitops.ml/docs/overview.html). If you have further questions, you can contact us directly on [Discord](https://discord.com/invite/Tapeh8agYy).