The license is inherited from the TAIDE Model.

This is an Eagle3 model for Llama-3.1-TAIDE-LX-8B-Chat, trained on custom sharegpt_gpt4 dataset, and for inferencing using vllm.

Following benchmark was ran with this benchmarking file and these settings:

  • A single H100 GPU

  • dtype: float16

  • attention-backend: flashinfer

  • speculative_config:

    • method: eagle3
    • draft_tensor_parallel_size: 1
    • num_speculative_tokens: 2
  • num_prompts: 1

  • Lhs: Baseline

  • Rhs: Eagle3

  • Achieving around 1.32x bump in inferencing speed

image/jpeg

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support