Instructions to use amd/kimi-k2.5-eagle3-fp8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use amd/kimi-k2.5-eagle3-fp8 with Transformers:
# Load model directly from transformers import AutoTokenizer, LlamaForCausalLMEagle3 tokenizer = AutoTokenizer.from_pretrained("amd/kimi-k2.5-eagle3-fp8") model = LlamaForCausalLMEagle3.from_pretrained("amd/kimi-k2.5-eagle3-fp8") - Notebooks
- Google Colab
- Kaggle
Model Overview
kimi-k2.5-eagle3-fp8 is an FP8-quantized version of lightseekorg/kimi-k2.5-eagle3, an Eagle3 MTP draft model for accelerating inference of Kimi-K2.5 with speculative decoding.
This checkpoint was quantized with AMD Quark. The quantized tensors use FP8 quantization metadata in the model config. The LM head is not quantized and was intentionally excluded from quantization.
Quantization Details
- Quantization tool: AMD Quark
- Quantization method:
quark - Format: FP8
- LM head: not quantized
- Export weight format: real quantized weights
The quantization metadata is stored in config.json, and the profiling summary is included in quark_profile.yaml.
Intended Use
This model is intended to be used as an Eagle3 draft model for speculative decoding with moonshotai/Kimi-K2.5 as the target model.
Because this is an AMD Quark FP8 checkpoint, make sure your inference runtime supports the quantization format and Eagle3 speculative decoding before deployment. Please validate quality and acceptance length in your own serving stack.
Citation and Acknowledgements
This model is derived from lightseekorg/kimi-k2.5-eagle3. Please refer to the source model card for the original training details, benchmarks, and acknowledgements.
License
Modifications Copyright(c) 2026 Advanced Micro Devices, Inc. All rights reserved.
- Downloads last month
- -