Llama 3 has just been rolled-out, exactly 9 month after the release of Llama 2. It is already available for chat at Meta web site, can be downloaded from Huggingface in safetensors or GGUF format.
While the previous generation has been trained on a dataset of 2 trillion tokens the new one utilised 15 trillion tokens.
What is fascinating is how the smaller 8B version outperformed the bigger previus-gen 70B model in every benchmark listed on the model card:
Benchmark | Llama 3 8B | Llama 2 7B | Llama 2 13B | Llama 3 70B | Llama 2 70B |
---|---|---|---|---|---|
GPQA (0-shot) | 34.2 | 21.7 | 22.3 | 39.5 | 21.0 |
HumanEval (0-shot) | 62.2 | 7.9 | 14.0 | 81.7 | 25.6 |
GSM-8K (8-shot, CoT) | 79.6 | 25.7 | 77.4 | 93.0 | 57.5 |
MATH (4-shot, CoT) | 30.0 | 3.8 | 6.7 | 50.4 | 11.6 |
Llama 3 has also upped the context window size from 4k to 8k tokens.