anon8231489123/ShareGPT_Vicuna_unfiltered
Updated • 181k • 871
This repository provides the EAGLE Draft model weights, related code, and training data based on OLMoE-1B-7B-Eagle3.
pytorch_model.bin: Trained EAGLE Draft model weightsconfig.json: Model configuration file (OLMoE architecture)tokenizer_config.json: Tokenizer configuration filemodeling_olmoe_kv.py: OLMoE-specific model code (required for EAGLE inference)eagle_data.json: Training dataset (ShareGPT questions + OLMoE-generated answers).gitattributes: Git LFS settings, etc.EAGLE is a framework designed to dramatically accelerate inference for large language models (LLMs)
by training a draft decoder layer separately.
pytorch_model.bin, config.json, and tokenizer_config.jsonmodeling_olmoe_kv.pyEAGLE/eagle/model/.from eagle.model.modeling_olmoe_kv import OlmoeForCausalLM
from eagle.model.ea_model import EaModel
from fastchat.model import get_conversation_template
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained('allenai/OLMoE-1B-7B-0125-Instruct')
model = EaModel.from_pretrained(
base_model_path='allenai/OLMoE-1B-7B-0125-Instruct',
ea_model_path='wantsleep/OLMoE_1B_7B_Eagle3',
torch_dtype='bfloat16',
low_cpu_mem_usage=True,
total_token=-1
)
your_message = "Why we study math?"
conv = get_conversation_template("vicuna")
conv.append_message(conv.roles[0], your_message)
conv.append_message(conv.roles[1], None)
prompt = conv.get_prompt()
input_ids = model.tokenizer([prompt]).input_ids
input_ids = torch.as_tensor(input_ids).to(DEVICE)
output_ids = model.eagenerate(input_ids, temperature=0.5, max_new_tokens=512, top_k=8)
output = model.tokenizer.decode(output_ids[0])
print(output)
modeling_olmoe_kv.py must be included in your EAGLE inference code for correct operation.For questions or feedback, please open an issue!
Base model
allenai/OLMoE-1B-7B-0125