Kimi-K2.6-mlx
MLX-compatible weights for moonshotai/Kimi-K2.6, converted using mlx_lm.convert.
Model Details
- Base model: moonshotai/Kimi-K2.6
- Architecture: Mixture-of-Experts (MoE), Kimi/DeepSeek family
- Layers: 61
- Hidden size: 7168
- KV heads: 64
- Quantization: INT4 compressed-tensors (pack-quantized)
- Model size on disk: ~612 GB (182 safetensors shards)
- Context length: 262,144 tokens
- Capabilities: text generation, thinking/reasoning, tool use (text-only; vision not supported — use mlx-vlm for VLM conversion)
Conversion
Converted from the original moonshotai/Kimi-K2.6 using:
python -m mlx_lm.convert \
--hf-path moonshotai/Kimi-K2.6 \
--mlx-path ./Kimi-K2.6-mlx \
--trust-remote-code
No additional quantization was applied — the model ships with compressed-tensors INT4 from the source.
Usage
With mlx-lm
pip install mlx-lm
python -m mlx_lm.generate \
--model aidiffuser/Kimi-K2.6-mlx \
--prompt "Hello, who are you?" \
--trust-remote-code
With exo (distributed inference)
This model runs on exo with JACCL/RDMA tensor parallelism across multiple Apple Silicon nodes. Tested on a 2-node Mac Studio M3 Ultra cluster (512 GB + 512 GB unified memory) at ~21 tok/s.
Recommended sampling parameters
temperature: 1.0
top_p: 0.95
min_p: 0.01
Hardware Requirements
This is a large MoE model. You will need significant unified memory to run it:
- Distributed: Two Apple Silicon machines with 512+ GB each, connected via Thunderbolt
License
This model inherits the Kimi K2 Community License from the base model.
- Downloads last month
- 2,551
Model size
1T params
Tensor type
BF16
·
F32 ·
U32 ·
Hardware compatibility
Log In to add your hardware
4-bit
Model tree for aidiffuser/Kimi-K2.6-MLX-4bit
Base model
moonshotai/Kimi-K2.6