Instructions to use XiaomiMiMo/MiMo-V2-Flash with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use XiaomiMiMo/MiMo-V2-Flash with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("XiaomiMiMo/MiMo-V2-Flash", dtype="auto") - Notebooks
- Google Colab
- Kaggle
docs: add KTransformers CPU offloading inference guide
#34
by ErvinX - opened
Add KTransformers as a recommended inference option for MiMo-V2-Flash.
KTransformers enables efficient deployment on consumer-grade hardware by offloading MoE expert computations to CPU while keeping other components on GPU. With 4× RTX 5090 + 2× AMD EPYC 9355, it achieves up to 35.7 tokens/s decode speed.
Benchmarks: https://ktransformers.net/benchmarks#MiMo-V2-Flash-FP8-TP4
bwshen-mi changed pull request status to merged