Introducing Llama 3, META's new AI

Llama 3, Meta's new AI that boasts excellent performance

William Karkegi

Meta is excited to introduce Meta Llama 3, the next generation of their state-of-the-art open language model. Llama 3 will soon be available on major platforms such as AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM WatsonX, Microsoft Azure, NVIDIA NIM, and Snowflake.

It will also be supported by hardware manufacturers like AMD, AWS, Dell, Intel, NVIDIA, and Qualcomm.

Improvements and New Capabilities

Llama 3 offers enhanced capabilities compared to its predecessors. The new models, with 8 billion (8B) and 70 billion (70B) parameters, demonstrate cutting-edge performance across a wide range of industry benchmarks.

These models are pre-trained and fine-tuned with instructions, supporting a variety of uses, including reasoning, code generation, and instruction following.

Objectives and Vision

Meta's goal with Llama 3 is to create the best open models, comparable to the best proprietary models available today.

They have taken developer feedback into account to increase the overall utility of Llama 3 while continuing to play a leading role in the responsible use and deployment of LLMs.

They aim to make Llama 3 multilingual and multimodal, with longer context windows and improved performance in the core capabilities of LLMs.

A Commitment to Responsible Development

Meta is committed to developing Llama 3 responsibly. They provide trust and safety tools such as Llama Guard 2, Code Shield, and CyberSec Eval 2 to help users use this model ethically and securely.

Architecture and Training Data

Llama 3 uses a single-decoder transformer architecture with a 128,000-token vocabulary, improving language encoding efficiency.

Meta pre-trained Llama 3 on more than 15 trillion (15T) tokens from public sources. Their dataset is seven times larger than that used for Llama 2 and includes four times more code.

Additionally, 5% of Llama 3's training dataset consists of high-quality non-English data covering more than 30 languages.

Scaling Training

Meta developed detailed scaling laws to assess downstream performance and optimize the use of their training resources.

Their 8B and 70B parameter models continue to improve even after being trained on 15T tokens. They used three types of parallelism to train their largest models, achieving compute utilization of over 400 TFLOPS per GPU.

Fine-Tuning and Safety

To unlock the full potential of their pre-trained models in chat use cases, Meta innovated their instruction fine-tuning approach, combining supervised fine-tuning, rejection sampling, proximal policy optimization (PPO), and direct preference optimization (DPO).

They also established testing teams to evaluate and enhance the security of their models.

Usage and Availability

Llama 3 will soon be available on all major platforms, including cloud providers, model APIs, and more.

Meta has integrated their latest models into Meta AI, available on Facebook, Instagram, WhatsApp, Messenger, and the web. You can also test Meta AI multimodal on their Ray-Ban Meta smart glasses.

Conclusion

Meta firmly believes that openness leads to better products, faster innovation, and a healthier market. They are excited to see all the amazing creations you will develop with Meta Llama 3.

For more information, visit the Llama 3 website and check out their getting started guide.

Miwend

Revolutionizing the online experience for businesses and customers.

Company

Product

Support