| # Eagle-3 Speculator for Llama-3.3-70B-Instruct | |
| This is an Eagle-3 speculator checkpoint converted to the [speculators](https://github.com/neuralmagic/speculators) format. | |
| ## Model Details | |
| - **Base Model**: meta-llama/Llama-3.3-70B-Instruct | |
| - **Speculator Type**: Eagle-3 | |
| - **Draft Vocabulary Size**: 32,000 | |
| - **Target Vocabulary Size**: 128,256 | |
| - **Architecture**: Single-layer transformer with vocabulary mapping | |
| - **Target Model Hidden Size**: 8,192 | |
| - **Draft Model Hidden Size**: 6,144 | |
| ## Key Features | |
| - **Vocabulary Mapping**: Maps between draft (32K) and target (128K) vocabularies | |
| - **Custom Attention**: Modified attention layer accepting 2×hidden_size input | |
| - **Fusion Layer**: Processes 3 verifier layers from target model (3×8192 → 6144) | |
| - **Optimized for 70B Models**: Specifically configured for Llama-3.3-70B architecture | |
| ## Usage | |
| ```python | |
| from speculators.models.eagle3 import Eagle3Speculator, Eagle3SpeculatorConfig | |
| from transformers import AutoModelForCausalLM | |
| # Load verifier model | |
| verifier = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.3-70B-Instruct") | |
| # Load Eagle-3 speculator | |
| speculator = Eagle3Speculator.from_pretrained( | |
| "nm-testing/EAGLE3-LLaMA3.3-Instruct-70B-speculators", | |
| verifier=verifier | |
| ) | |
| ``` | |
| ## Configuration | |
| This model uses the Eagle-3 architecture with: | |
| - Hidden size: 6,144 (draft model) | |
| - Target hidden size: 8,192 (70B Llama model) | |
| - Attention heads: 48 | |
| - Key-value heads: 8 | |
| - Intermediate size: 16,384 | |
| - RMS norm epsilon: 1e-05 | |
| ## Original Model | |
| Converted from: [yuhuili/EAGLE3-LLaMA3.3-Instruct-70B](https://huggingface.co/yuhuili/EAGLE3-LLaMA3.3-Instruct-70B) | |
| ## Citation | |
| Based on the Eagle-3 paper: https://arxiv.org/abs/2503.01840 | |
| ## License | |
| Please refer to the base Llama-3.3 model license. |