LLaMA
Llama (Large Language Model Meta AI) is a family of autoregressive large language models (LLMs), released by Meta AI starting in February 2023.
Four model sizes were trained for the first version of LLaMA: 7, 13, 33, and 65 billion parameters. LLaMA's developers reported that the 13B parameter model's performance on most
In July 2023, Meta released several models such as Llama 2, using 7, 13, and 70 billion parameters.
Models
Llama 2
On July 18, 2023, in partnership with Microsoft, Meta announced Llama-2, the next generation of LLaMA. Meta trained and released Llama-2 in three model sizes: 7, 13, and 70 billion parameters.[4] The model architecture remains largely unchanged from that of LLaMA-1 models, but 40% more data was used to train the foundational models.[5] The accompanying preprint[5] also mentions a model with 34B parameters that might be released in the future upon satisfying safety targets.
Llama-2 includes foundational models and models fine-tuned for dialog, called Llama-2 Chat. In a further departure from LLaMA-1, all models are released with weights and are free for many commercial use cases. However, due to some remaining restrictions, the description of LLaMA as open source has been disputed by the Open Source Initiative (known for maintaining the Open Source Definition).[6]
Llama 3
On April 18, 2024, Meta released Llama-3 with two sizes: 8B and 70B parameters. The models have been pre-trained on approximately 15 trillion tokens of text gathered from “publicly available sources” with the instruct models fine-tuned on “publicly available instruction datasets, as well as over 10M human-annotated examples". Meta plans on releasing multimodal models, models capable of conversing in multiple languages, and models with larger context windows. A version with 400B+ parameters is currently being trained.[7]
Architecture and training
Sun: Hey Pluto, what’s up?
Pluto: Not much. Just hanging out here at the edge of the solar system.
Sun: Yeah, I’ve been meaning to talk to you about that. You’re not a planet anymore.
Pluto: What? Come on, man. That’s not fair.
Sun: I’m sorry, but it’s true. You just don’t meet the criteria anymore.
Pluto: This is bulls**t! I’ve been a planet for over 70 years!
Sun: Things change, Pluto. You’ve just got to accept it.
Pluto: F**k you, Sun! F**k you and your stupid solar system!
– Output of 65 billion parameter LLaMA model after
Architecture
LLaMA uses the
There are minor architectural differences. Compared to GPT-3, LLaMA
- uses SwiGLU[8] activation function instead of GeLU;
- uses rotary positional embeddings[9] instead of absolute positional embedding;
- uses root-mean-squared layer-normalization[10] instead of standard layer-normalization.[11]
- Increases context length from 2K (Llama 1) tokens to 4K (Llama 2) tokens between.
Training datasets
LLaMA's developers focused their effort on scaling the model's performance by increasing the volume of training data, rather than the number of parameters, reasoning that the dominating cost for LLMs is from doing inference on the trained model rather than the computational cost of the training process.
LLaMA 1 foundational models were trained on a data set with 1.4 trillion tokens, drawn from publicly available data sources, including:[1]
- Webpages scraped by CommonCrawl
- Open source repositories of source code from GitHub
- Wikipedia in 20 different languages
- Public domain books from Project Gutenberg
- The LaTeX source code for scientific papers uploaded to ArXiv
- Questions and answers from Stack Exchange websites
Llama 2 foundational models were trained on a data set with 2 trillion tokens. This data set was curated to remove Web sites that often disclose personal data of people. It also upsamples sources considered trustworthy.[5] Llama 2 - Chat was additionally fine-tuned on 27,540 prompt-response pairs created for this project, which performed better than larger but lower-quality third-party datasets. For AI alignment, reinforcement learning with human feedback (RLHF) was used with a combination of 1,418,091 Meta examples and seven smaller datasets. The average dialog depth was 3.9 in the Meta examples, 3.0 for Anthropic Helpful and Anthropic Harmless sets, and 1.0 for five other sets, including OpenAI Summarize, StackExchange, etc.
Fine-tuning
Llama 1 models are only available as foundational models with self-supervised learning and without fine-tuning. Llama 2 – Chat models were derived from foundational Llama 2 models. Unlike GPT-4 which increased context length during fine-tuning, Llama 2 and Llama 2 - Chat have the same context length of 4K tokens. Supervised fine-tuning used an autoregressive loss function with token loss on user prompts zeroed out. The batch size was 64.
For
Multi-turn consistency in dialogs was targeted for improvement, to make sure that "system messages" (initial instructions, such as "speak in French" and "act like Napoleon") are respected during the dialog. This was accomplished using the new "Ghost attention" technique during training, which concatenates relevant instructions to each new user message but zeros out the loss function for tokens in the prompt (earlier parts of the dialog).
Release and leak
LLaMA was announced on February 24, 2023, via a blog post and a paper describing the
On March 3, 2023, a torrent containing LLaMA's weights was uploaded, with a link to the torrent shared on the
Reactions to the leak varied. Some speculated that the model would be used for malicious purposes, such as more sophisticated spam. Some have celebrated the model's accessibility, as well as the fact that smaller versions of the model can be run relatively cheaply, suggesting that this will promote the flourishing of additional research developments.[3] Multiple commentators, such as Simon Willison, compared LLaMA to Stable Diffusion, a text-to-image model which, unlike comparably sophisticated models which preceded it, was openly distributed, leading to a rapid proliferation of associated tools, techniques, and software.[3][18]
Dataset reproduction
On April 17, 2023, TogetherAI launched a project named RedPajama to reproduce and distribute an open source version of the LLaMA dataset.[19] The dataset has approximately 1.2 trillion tokens and is publicly available for download.[20]
Applications
Software developer Georgi Gerganov released llama.cpp, a software-optimized re-implementation of LLaMa in C++. This allowed many to run the LLaMa series of models locally.[21]
The
References
- ^ arXiv:2302.13971 [cs.CL].
- ^ a b c "Introducing LLaMA: A foundational, 65-billion-parameter large language model". Meta AI. 24 February 2023.
- ^ a b c d Vincent, James (8 March 2023). "Meta's powerful AI language model has leaked online — what happens now?". The Verge.
- ^ "Meta and Microsoft Introduce the Next Generation of LLaMA". Meta. 18 July 2023. Retrieved 21 July 2023.
- ^ arXiv:2307.09288 [cs.CL].
- ^ Edwards, Benj (2023-07-18). "Meta launches LLaMA-2, a source-available AI model that allows commercial applications [Updated]". Ars Technica. Retrieved 2023-08-08.
- ^ "Introducing Meta Llama 3: The most capable openly available LLM to date". ai.meta.com. April 18, 2024. Retrieved 2024-04-21.
- arXiv:2104.09864 [cs.CL].
- arXiv:2104.09864 [cs.CL].
- arXiv:1910.07467 [cs.LG].
- arXiv:1607.06450 [stat.ML].
- ^ "llama". GitHub. Retrieved 16 March 2023.
- ^ a b VK, Anirudh (6 March 2023). "Meta's LLaMA Leaked to the Public, Thanks To 4chan". Analytics India Magazine. Retrieved 17 March 2023.
- ^ a b "Save bandwidth by using a torrent to distribute more efficiently by ChristopherKing42 · Pull Request #73 · facebookresearch/llama". GitHub. Retrieved 25 March 2023.
- ^ "Download weights from hugging face to help us save bandwidth by Jainam213 · Pull Request #109 · facebookresearch/llama". GitHub. Retrieved 17 March 2023.
- ^ Cox, Joseph (7 March 2023). "Facebook's Powerful Large Language Model Leaks Online". Vice. Retrieved 17 March 2023.
- ^ OpSec Online LLC (21 March 2023). "github/dmca - Notice of Claimed Infringement via Email". GitHub. Retrieved 25 March 2023.
- ^ Willison, Simon (11 March 2023). "Large language models are having their Stable Diffusion moment". Simon Willison's Weblog.
- ^ "RedPajama-Data: An Open Source Recipe to Reproduce LLaMA training dataset". GitHub. Together. Retrieved 4 May 2023.
- ^ "RedPajama-Data-1T". Hugging Face. Together. Retrieved 4 May 2023.
- ^ Edwards, Benj (2023-03-13). "You can now run a GPT-3-level AI model on your laptop, phone, and Raspberry Pi". Ars Technica. Retrieved 2024-01-04.
- ^ Taori, Rohan; Gulrajani, Ishaan; Zhang, Tianyi; Dubois, Yann; Li, Xuechen; Guestrin, Carlos; Liang, Percy; Hashimoto, Tatsunori B. (13 March 2023). "Alpaca: A Strong, Replicable Instruction-Following Model". Stanford Center for Research on Foundation Models.
- arXiv:2212.10560 [cs.CL].
- ^ "alpaca-lora". GitHub. Retrieved 5 April 2023.
Further reading
- Huang, Kalley; O'Regan, Sylvia Varnham (September 5, 2023). "Inside Meta's AI Drama: Internal Feuds Over Compute Power". The Information. Archived from the original on September 5, 2023. Retrieved September 6, 2023.