|
|
--- |
|
|
license: llama3.3 |
|
|
--- |
|
|
# Llama 3.3 8B Instruct |
|
|
|
|
|
Yes, this is official, and yes, this is, to my knowledge, a real version of Llama 3.3 8B! |
|
|
|
|
|
**I would highly recommend trying both this model, and [a version with the Llama 3.3 70B config applied to extend the context length to 128k](/shb777/Llama-3.3-8B-Instruct). I am unsure as to which one of these is closest to real; while the original copy I downloaded came with the 8k context configuration, benchmarks seem to slightly improve on the 128k version.** |
|
|
|
|
|
|
|
|
## Benchmarks |
|
|
|
|
|
| | Llama 3.1 8B Instruct | Llama 3.3 8B Instruct (as downloaded from Facebook) | [Llama 3.3 8B Instruct w/ Llama 3.3 70B RoPE config to extend to 128k context](/shb777/Llama-3.3-8B-Instruct) |
|
|
|-|-|-|-| |
|
|
|IFEval (1 epoch, score avged across all strict/loose instruction/prompt accuracies to follow Llama 3 paper)|78.2|81.95|84.775 |
|
|
|GPQA Diamond (3 epochs)|29.3|37.0|37.5 |
|
|
|
|
|
All benchmarks done in OpenBench at 1.0 temp. |
|
|
|
|
|
<details> |
|
|
|
|
|
|
|
|
## Rationale |
|
|
|
|
|
Facebook has a [Llama API](https://llama.developer.meta.com) available that allows for inference of the other Llama models (L3.3 70B, L4 Scout and Maverick), but *also* includes a special, new (according to the original press release) "Llama 3.3 8B" that didn't exist anywhere else and was stuck behind the Facebook API! |
|
|
|
|
|
However. The Llama API supports finetuning L3.3... *and downloading the final model in HF format.* Problem solved, right? |
|
|
|
|
|
Wellllllllllllllll. Not really. The finetuning API was hidden behind layers of support tickets. I tried when the original API dropped in April, and was just told "We'll think about it and send you any updates" (there never were any updates). |
|
|
|
|
|
Flash forward to December, on a whim I decide to look at the API again. And... by god... the finetuning tab was there. I could click on it and start a job (please ignore that I have no idea how it works, and in fact the finetuning tab actually disappeared after the first time I clicked on it, though I could still manually go to the page). |
|
|
|
|
|
Apparently, this was not very well tested, as there were a good few bugs, the UI was janky, and the download model function did not actually work due to CORS (I had to manually curl things to get the CDN link). |
|
|
|
|
|
But... by god... the zip file downloaded, and I had my slightly finetuned model. |
|
|
|
|
|
To my shock and delight, however, they also provide the adapter that they merged into the model. That means I can *subtract* that adapter and get the original model. And... here we are! |
|
|
|
|
|
## More cursed discoveries |
|
|
|
|
|
1. Apparently the context length of the *original* Llama 3.3 model (i.e. the regular one that the Llama API serves) is 128k, while the finetunable version is only 8k. This is true across both the downloaded version *and* the version of the finetune served by the API (refer to screenshot with 10k tokens worth of `a` as input). This does not really make any coherent sense. |
|
|
 |
|
|
|
|
|
## Are you sure this is really Llama 3.3? |
|
|
|
|
|
As far as I'm aware! It has stylistic tics that differ from Llama 3 and 3.1. |
|
|
|
|
|
There are a few weird artifacts, however, of the base model being the... original Llama 3(???) |
|
|
|
|
|
The original ZIP included an `original_repo_id.json` that contained: |
|
|
|
|
|
```json |
|
|
{ |
|
|
"repo_id": "meta-llama/Meta-Llama-3-8B-Instruct" |
|
|
} |
|
|
``` |
|
|
|
|
|
and, furthermore, the `adapter_config.json` *also* had Llama 3 as the base model. However, the models clearly act different and know different things! Furthermore, I tested across the Llama API and my copy, and they both share the same differences from L3 and L3.1. |
|
|
|
|
|
Suffice it to say, I'm pretty sure this is really Llama 3.3 8B. |
|
|
|
|
|
## Is this legal? |
|
|
|
|
|
According to the [T&S of the Llama API as of December 29th, 2025](https://archive.is/y7KSR): |
|
|
|
|
|
> For example, via the Llama API, you may receive access to the Llama 3.3 8b model, which is considered a Llama AI model and part of the Meta AI Materials; when downloaded, and not accessed via the Llama API, the Llama 3.3 8b model is subject to the Llama 3.3 Community License Agreement and Acceptable Use Policy. |
|
|
|
|
|
The Llama 3.3 8b model (after downloading) is subject to the regular L3.3 license, which allows for redistribution. So... as far as I can tell, yes, this is perfectly legal to redistribute! |
|
|
|
|
|
--- |
|
|
|
|
|
Llama 3.3 is licensed under the Llama 3.3 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved |
|
|
|
|
|
Please email me at `fizzarolli [at] riseup.net` for any conerns. If Meta would like me to take this model down, please have someone email me and ask from an official Meta address. |
|
|
|
|
|
|
|
|
|
|
|
</details> |