June 28, 2024

Google's Gemma 2 with 9B and 27B parameters are finally here

Google's lightweight models Gemma 2 9B and 27B are now available to developers and researchers.

by

Jim Clyde Monge

Four months after Google introduced the Gemma 2 in Google IO 2024, they are finally making it available to researchers and developers worldwide. The tech giant is releasing the model in two variants — 9 billion and 27 billion parameters.

But Google isn’t stopping with just two sizes of Gemma 2. The company has announced plans to soon release a 2.6 billion parameter model designed to “bridge the gap between lightweight accessibility and powerful performance.”

What is Gemma 2?

Gemma 2 is available in two sizes: 9 billion and 27 billion parameters, each with standard and instruction-tuned variants.

The 9B model was trained on approximately 8 trillion tokens, while the 27B version was trained on about 13 trillion tokens of web data, code, and math.

Both models feature a context length of 8,000 tokens. The instruction-tuned variants are denoted as “gemma-2–9b-it” and “gemma-2–27b-it,” while the base models are simply “gemma-2–9b” and “gemma-2–27b”.

These lightweight models are designed to run efficiently on various hardware, including Nvidia GPUs and Google’s TPUs, making them suitable for both cloud and on-device applications.

If you want to learn more about the technical details of Gemma 2, check out this whitepaper.

You can also download Gemma 2’s model weights from these platforms:

Devs and researchers can use Gemma 2 for free through Kaggle or through a free tier for Colab notebooks.

Try Gemma 2 in AI Studio

If you want to quickly check how the new model works and the quality of its response, try it on Google’s AI Studio.

Once logged in, you’ll notice that the new ‘Gemma 2’ model is now available in the Models dropdown list.

Here’s an example prompt I fed to Gemma 2.

Here's a blurb for an upcoming startup event. Rewrite this in a more professional tone intended for a corporate email.
"Calling all tech junkies, investors, and curious minds! 🤖🧠🎉
Ready to witness the future? We're throwing a pitch party where brilliant startups will 
showcase their mind-blowing AI creations. Think healthcare that reads your mind (almost!), 
businesses that run themselves (we wish!), and robots that make you coffee (we definitely need!).
Whether you're an investor, a budding entrepreneur, or just someone who wants to geek out over AI, 
this event is your jam! Come watch the pitches, mingle with industry experts, and get inspired by 
these rockstar companies. Oh, and did we mention free food and drinks? Register now before the robots take all the spots! 😉"

And here’s the response I got.

Subject: Invitation to [Event Name]: An AI Startup Showcase
Dear [Recipient Name],
We cordially invite you to [Event Name], a premier showcase of cutting-edge AI startups.
Join us as innovative companies present their groundbreaking solutions across diverse sectors, 
including healthcare, business automation, and robotics. This event offers a unique opportunity to:
Discover the latest advancements in artificial intelligence.

Network with industry leaders, investors, and fellow tech enthusiasts.

Gain insights into the future of AI and its transformative potential.

Complimentary refreshments will be served.
We encourage you to register at your earliest convenience as space is limited.
Sincerely,
The [Event Name] Team
[Website Link]

You can also try Gemma 2 in Ollama. Download Ollama here.

Then run the model via the command:

ollama run gemma2

This will download the 5.5 GB manifest file. After the download, you can immediately chat with Gemma 2 in the terminal.

Try Gemma 2 in HuggingFace

If you don’t want to download the model on your local PC, try it for free on HuggingChat. Just select the Current Model to “google/gemma-2–27b-it”.

You can also turn on the Search web feature to allow the model to search the internet.

Gemma 2 with Hugging Face Transformers

To use Gemma models with transformers, make sure to use the latest transformers release:

pip install "transformers==4.42.1" --upgrade

Here’s a sample code on how to use gemma-2-9b-it with transformers.

from transformers import pipeline
import torch

pipe = pipeline(
    "text-generation",
    model="google/gemma-2-9b-it",
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda",
)

messages = [
    {"role": "user", "content": "What's the meaning of life?"},
]
outputs = pipe(
    messages,
    max_new_tokens=256,
    do_sample=False,
)
assistant_response = outputs[0]["generated_text"][-1]["content"]
print(assistant_response)

It requires about 18 GB of RAM, which fits many consumer GPUs. The same snippet works for gemma-2-27b-it, which, at 56GB of RAM, makes it a very interesting model for production use cases.

Gemma 2 Licensing

Gemma 2 comes with the same license as the first iteration, which is a permissive license that allows redistribution, fine-tuning, commercial use, and derivative works.

This open licensing approach enables developers and researchers to freely utilize and modify the model for various applications.

Final Thoughts

One disappointing aspect of Gemma 2 is its context length of 8K tokens. It’s way too small considering that Gemini now has a 2 million context length.

Nevertheless, the 27b model that beats llama 3 70b and haiku is very interesting and it excites me to try it out on various use cases.

I am particularly excited about how this model stacks up against the upcoming Llama 3 40 with 400 billion parameters from Meta. Check out a sneak preview of the upcoming model.

‍

Stay ahead. Stay updated.