The license is inherited from the TAIDE Model.

This is an Eagle3 model for Llama-3.1-TAIDE-LX-8B-Chat, trained on custom sharegpt_gpt4 dataset, and for inferencing using vllm.

Following benchmark was ran with this benchmarking file and these settings:

A single H100 GPU
dtype: float16
attention-backend: flashinfer
speculative_config:
- method: eagle3
- draft_tensor_parallel_size: 1
- num_speculative_tokens: 2
num_prompts: 1
Lhs: Baseline
Rhs: Eagle3
Achieving around 1.32x bump in inferencing speed

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support