Kimi-K2.6-mlx

MLX-compatible weights for moonshotai/Kimi-K2.6, converted using mlx_lm.convert.

Model Details

  • Base model: moonshotai/Kimi-K2.6
  • Architecture: Mixture-of-Experts (MoE), Kimi/DeepSeek family
  • Layers: 61
  • Hidden size: 7168
  • KV heads: 64
  • Quantization: INT4 compressed-tensors (pack-quantized)
  • Model size on disk: ~612 GB (182 safetensors shards)
  • Context length: 262,144 tokens
  • Capabilities: text generation, thinking/reasoning, tool use (text-only; vision not supported — use mlx-vlm for VLM conversion)

Conversion

Converted from the original moonshotai/Kimi-K2.6 using:

python -m mlx_lm.convert \
  --hf-path moonshotai/Kimi-K2.6 \
  --mlx-path ./Kimi-K2.6-mlx \
  --trust-remote-code

No additional quantization was applied — the model ships with compressed-tensors INT4 from the source.

Usage

With mlx-lm

pip install mlx-lm
python -m mlx_lm.generate \
  --model aidiffuser/Kimi-K2.6-mlx \
  --prompt "Hello, who are you?" \
  --trust-remote-code

With exo (distributed inference)

This model runs on exo with JACCL/RDMA tensor parallelism across multiple Apple Silicon nodes. Tested on a 2-node Mac Studio M3 Ultra cluster (512 GB + 512 GB unified memory) at ~21 tok/s.

Recommended sampling parameters

temperature: 1.0
top_p: 0.95
min_p: 0.01

Hardware Requirements

This is a large MoE model. You will need significant unified memory to run it:

  • Distributed: Two Apple Silicon machines with 512+ GB each, connected via Thunderbolt

License

This model inherits the Kimi K2 Community License from the base model.

Downloads last month
2,551
Safetensors
Model size
1T params
Tensor type
BF16
·
F32
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aidiffuser/Kimi-K2.6-MLX-4bit

Quantized
(30)
this model