What is Gemma?
Gemma is a family of 4 new LLM models by Google based on Gemini. It comes in two sizes: 2B and 7B parameters, each with base (pretrained) and instruction-tuned versions. All the variants can be run on various types of consumer hardware, even without quantization, and have a context length of 8K tokens
https://huggingface.co/blog/gemma
In this post, we will try to run Gemma on the Google Colab Free tier. To do that, we will need to use the quantized model since gemma-7b requires 18GB GPU RAM.
requirements
- HuggingFace account
- Google account
Step 1. Get access to Gemma
We can use Gemma with Transformers
4.38 but to do that first we need to get a grant to access the model.
https://huggingface.co/google/gemma-7b
Once you get a grant, you will see the below in the above page.
Step 2. Add HF_TOKEN to Google Colab
We need to add HF_TOKEN
to Google Colab to access gemma via Transformers.
First we need to get a token from Huggingface.
https://huggingface.co/settings/tokens
Then click the key icon in the sidebar on Google Colab like below.
Step 3. Install packages
!pip install -U "transformers==4.38.1" --upgrade
!pip install accelerate
!pip install -i https://pypi.org/simple/ bitsandbytes
Step 4. Write Python code to run Gemma
We can use gemma-7b
model via transformers.
from transformers import AutoTokenizer, pipeline
import torch
model = "google/gemma-7b-it"
# use quantized model
pipeline = pipeline(
"text-generation",
model=model,
model_kwargs={
"torch_dtype": torch.float16,
"quantization_config": {"load_in_4bit": True}
},
)
messages = [
{"role": "user", "content": "Tell me about ChatGPT"},
]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(
prompt,
max_new_tokens=256,
do_sample=True,
temperature=0.7,
top_k=50,
top_p=0.95
)
print(outputs[0]["generated_text"][len(prompt):])
Result
The following is the result of the above code.
As you can see the output is wrong unfortunately. So at this moment , Gemma is missing the latest data or not a good model. ๐ฅฒ
ChatGPT is a large language model (LLM) developed by Google. It is a conversational AI model that can engage in a wide range of topics and tasks, including:
Key Features:
- Natural Language Processing (NLP): ChatGPT is able to understand and generate human-like text, including code, scripts, poems, articles, and more.
- Information Retrieval: It can provide information on a vast number of topics, from history to science to technology.
- Conversation: It can engage in natural language conversation, answer questions, and provide information.
- Code Generation: It can generate code in multiple programming languages, including Python, Java, C++, and more.
- Task Completion: It can complete a variety of tasks, such as writing stories, summarizing text, and translating languages.
Additional Information:
- Large Language Model: ChatGPT is a large language model, trained on a massive amount of text data, making it able to learn complex relationships and patterns.
- Transformer-Based: ChatGPT uses a transformer-based architecture, which allows it to process language more efficiently than traditional language models.
- Open-Source: ChatGPT is open-sourced, meaning that anyone can contribute to its development