The license is inherited from the TAIDE Model.
This is an Eagle3 model for Llama-3.1-TAIDE-LX-8B-Chat, trained on custom sharegpt_gpt4 dataset, and for inferencing using vllm.
Following benchmark was ran with this benchmarking file and these settings:
A single H100 GPU
dtype:float16attention-backend:flashinferspeculative_config:method:eagle3draft_tensor_parallel_size:1num_speculative_tokens:2
num_prompts:1Lhs: Baseline
Rhs: Eagle3
Achieving around 1.32x bump in inferencing speed
- Downloads last month
- 1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
