OLMoE-1B-7B-Eagle3 Draft Model

This repository provides the EAGLE Draft model weights, related code, and training data based on OLMoE-1B-7B-Eagle3.


📦 Included Files

  • pytorch_model.bin: Trained EAGLE Draft model weights
  • config.json: Model configuration file (OLMoE architecture)
  • tokenizer_config.json: Tokenizer configuration file
  • modeling_olmoe_kv.py: OLMoE-specific model code (required for EAGLE inference)
  • eagle_data.json: Training dataset (ShareGPT questions + OLMoE-generated answers)
  • .gitattributes: Git LFS settings, etc.

🦅 What is the EAGLE Draft Model?

EAGLE is a framework designed to dramatically accelerate inference for large language models (LLMs)
by training a draft decoder layer separately.

  • Fully compatible with OLMoE-1B-7B-0125-Instruct architecture
  • The EAGLE Draft layer is structurally similar to the main model’s decoder
  • During inference, the draft layer generates multiple tokens in advance, which are then verified/accepted by the main model

📝 Training Data Description

  • eagle_data.json
    • Only questions (prompts) are extracted from the ShareGPT dataset
    • For each question, the allenai/OLMoE-1B-7B-0125-Instruct model generates its own answer
    • Thus, the model’s self-generated answers are used as ground truth to train the draft layer
    • This approach ensures the draft layer learns a distribution very close to the main model’s decoder,
      maximizing EAGLE inference performance

🛠️ Usage

1. Using Model Weights/Config Files

  • pytorch_model.bin, config.json, and tokenizer_config.json
    can be used directly with HuggingFace Transformers or EAGLE code.

2. Integrating with EAGLE Inference Code

  • Copy modeling_olmoe_kv.py
    into the official EAGLE repo at EAGLE/eagle/model/.
  • In your EAGLE inference script, import as:
    from eagle.model.modeling_olmoe_kv import OlmoeForCausalLM
    

3. Example Code

from eagle.model.ea_model import EaModel
from fastchat.model import get_conversation_template
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
 
tokenizer = AutoTokenizer.from_pretrained('allenai/OLMoE-1B-7B-0125-Instruct')
model = EaModel.from_pretrained(
    base_model_path='allenai/OLMoE-1B-7B-0125-Instruct',
    ea_model_path='wantsleep/OLMoE_1B_7B_Eagle3',
    torch_dtype='bfloat16',
    low_cpu_mem_usage=True,
    total_token=-1
)

your_message = "Why we study math?"
conv = get_conversation_template("vicuna")
conv.append_message(conv.roles[0], your_message)
conv.append_message(conv.roles[1], None)
prompt = conv.get_prompt()
input_ids = model.tokenizer([prompt]).input_ids
input_ids = torch.as_tensor(input_ids).to(DEVICE)

output_ids = model.eagenerate(input_ids, temperature=0.5, max_new_tokens=512, top_k=8)
output = model.tokenizer.decode(output_ids[0])
print(output)

⚠️ Notes

  • eagle_data.json contains only OLMoE-generated answers for public ShareGPT questions.
  • The EAGLE Draft layer should be designed as close as possible to the main model’s decoder
    for optimal inference efficiency.
  • modeling_olmoe_kv.py must be included in your EAGLE inference code for correct operation.

📚 References


For questions or feedback, please open an issue!

Downloads last month
159
Safetensors
Model size
0.1B params
Tensor type
I64
·
BF16
·
BOOL
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for wantsleep/OLMoE_1B_7B_Eagle3

Dataset used to train wantsleep/OLMoE_1B_7B_Eagle3