Model Description

This model is a refined version of the original EAGLE-3 model, trained with several key improvements.

Features

Attention Mask Fix: Addresses an issue with the attention mask found in the original EAGLE repository. Further details are available in this pull request.
Positional Embedding Alignment: The rope_theta has been set to 500,000 to align with the Llama-3.1-8B-Instruct model, correcting a mismatch from the original training setting (10,000).
Extended Context Length: The model was trained on data with a sequence length of 4096, an increase from the original 2048. Additionally, max_position_embeddings is set to 128,000 to facilitate further pretraining on long contexts.
Training Framework: The model was trained using the SpecForge library.

Performance was evaluated using the SpecForge benchmark suite.

Checkpoint	MT-Bench	GSM8K	HumanEval
Original	5.690	6.145	6.817
This work	5.999	6.221	6.804

Safetensors

Model size

1.0B params

Tensor type

I64

BF16

BOOL

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Finetuned

Finetuned

this model