July 24, 2024

Meta's Llama 3.1 Changes Everything

Llama 3.1 surpasses GPT-4o, Gemma 2, and Claude 3.5 Sonnet in certain benchmarks.

by

Jim Clyde Monge

In April, Meta teased an open-source model designed to outperform the most powerful closed-source models from companies like OpenAI and Google.

Today, Meta has made history with the release of the largest open-source language model in the world, Llama 3.1 405B. The world now has access to state-of-the-art (SOTA) models that are free to use.

CEO Mark Zuckerberg boldly predicts that Meta AI will surpass ChatGPT to become the most widely used assistant by the end of this year.

Key takeaways:

The Llama 3.1 is a family of language models with 8 billion, 70 billion, and 405 billion parameters
The 405B parameter model was trained with over 16,000 of Nvidia’s H100 GPUs and has a context window of up to 128K tokens
The models are multilingual, with support for French, German, Hindi, Italian, Portuguese, Spanish, and Thai
The 405 billion parameter model outperforms GPT-4, GPT-4o, Gemma 2, and Claude 3.5 Sonnet in some benchmarks

What is Llama 3.1?

Meta’s Llama 3.1 is a collection of pre-trained and instruction-tuned generative multilingual language models. It comes in three configurations: 8 billion, 70 billion, and 405 billion parameters.

8B: A light-weight, ultra-fast model you can run anywhere.
70B: A highly performant, cost effective model that enables diverse use cases.
405B: Flagship foundation model driving the widest variety of use cases.

Meta Releases Llama 3.1 with 405B Parameters — Image from Meta AI

The text-only models are optimized for multilingual dialogue use cases and outperform many existing open-source and closed chat models, like

Llama 3.1 Model Architecture

Llama 3.1 is an auto-regressive language model using an optimized transformer architecture. The fine-tuned versions incorporate supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to ensure the model aligns with human preferences for helpfulness and safety.

Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture. The fine-tuned versions incorporate supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to ensure the model aligns with human preferences for helpfulness and safety. — Image from Meta AI

Token count refers to pretraining data only. All the models use Grouped-Query Attention (GQA) for improved inference scalability.

If you want to learn more about the technical details of Llama 3.1, check out this research paper from Meta.

New Capabilities of Llama 3.1

This colossal language model introduces new capabilities, including:

Longer context window
Multimodal input and output
Possible integration with third-party tools

Llama 3.1 supports seven languages in addition to English: French, German, Hindi, Italian, Portuguese, Spanish, and Thai.

Check out the table for the multilingual benchmarks.

While Llama may generate text in other languages, these outputs may not meet the performance thresholds for safety and helpfulness. Meta strongly advises developers to avoid using this model for conversations in unsupported languages without implementing fine-tuning and system controls.

Image Generation Capability

Meta AI has introduced a new “Imagine Me” feature that scans your face using your phone’s camera, allowing you to insert your likeness into AI-generated images.

By capturing your likeness directly through the camera instead of using photos from your profile, Meta aims to prevent the creation of deepfakes.

Llama 3.1 can also turn your generated still images into animations and add, remove, or change the images you generate.

Performance Benchmarks

Based on the benchmarks below, Meta’s Llama 3.1 models surpass OpenAI’s GPT-4o and other popular language models in various tests, setting a new standard in several key areas of AI performance.

Meta also performed human evaluation of Llama 3.1 against GPT-4, GPT-4o, and Claude 3.5 Sonnet. Here are the results:

Left: Comparison with GPT-4.
Middle: Comparison with GPT-4o.
Right: Comparison with Claude 3.5 Sonnet.

Left: Comparison with GPT-4. Middle: Comparison with GPT-4o. Right: Comparison with Claude 3.5 Sonnet. — Image from Meta AI

All results include 95% confidence intervals and exclude ties.

Try it Yourself

Llama 3.1 is now available in the Groq Playground.

Although the 405B parameter model is not currently available in the playground, you can try it on Groq Chat.

The 405B parameter is currently not available in the playground, but you can try it on Groq Chat. — Image from Meta AI

The latest models are available in Meta AI, but only to selected countries.

The latest models should already be available in Meta AI, but only in selected countries. — Image from Meta AI

We’re rolling out Meta AI in English in more than a dozen countries outside of the US. Now, people will have access to Meta AI in Australia, Canada, Ghana, Jamaica, Malawi, New Zealand, Nigeria, Pakistan, Singapore, South Africa, Uganda, Zambia and Zimbabwe — and we’re just getting started.

How Get The Model

You can download the models from these three websites:

Additionally, you can access the Prompt Guard and Llama Guard models from their respective repositories. Prompt Guard models are tunable models designed to prevent prompt-injection attacks, while Llama Guard models provide input and output guardrails for LLM deployments, based on MLCommons policy.

Final Thoughts

Open-source AI is a big deal. This openness means more ideas and innovations from developers all over the world. It’s a stark contrast to closed-source models that limit access and creativity.

But benchmarks don’t reflect real-world performance.

Even though benchmarks show Llama 3.1’s impressive capabilities, we’ll only see its true potential through real-world use by the community. With more people using and improving these models, we can expect exciting new AI tools and applications in the future.

‍

Stay ahead. Stay updated.