anon8231489123/ShareGPT_Vicuna_unfiltered
Updated • 192k • 878
This model is a refined version of the original EAGLE-3 model, trained with several key improvements.
rope_theta has been set to 500,000 to align with the Llama-3.1-8B-Instruct model, correcting a mismatch from the original training setting (10,000).max_position_embeddings is set to 128,000 to facilitate further pretraining on long contexts.Performance was evaluated using the SpecForge benchmark suite.
| Checkpoint | MT-Bench | GSM8K | HumanEval |
|---|---|---|---|
| Original | 5.690 | 6.145 | 6.817 |
| This work | 5.999 | 6.221 | 6.804 |
Base model
meta-llama/Llama-3.1-8B