Kimi-K2.6-mlx

MLX-compatible weights for moonshotai/Kimi-K2.6, converted using mlx_lm.convert.

Model Details

Base model: moonshotai/Kimi-K2.6
Architecture: Mixture-of-Experts (MoE), Kimi/DeepSeek family
Layers: 61
Hidden size: 7168
KV heads: 64
Quantization: INT4 compressed-tensors (pack-quantized)
Model size on disk: ~612 GB (182 safetensors shards)
Context length: 262,144 tokens
Capabilities: text generation, thinking/reasoning, tool use (text-only; vision not supported — use mlx-vlm for VLM conversion)

Conversion

Converted from the original moonshotai/Kimi-K2.6 using:

python -m mlx_lm.convert \
  --hf-path moonshotai/Kimi-K2.6 \
  --mlx-path ./Kimi-K2.6-mlx \
  --trust-remote-code

No additional quantization was applied — the model ships with compressed-tensors INT4 from the source.

Usage

With mlx-lm

pip install mlx-lm
python -m mlx_lm.generate \
  --model aidiffuser/Kimi-K2.6-mlx \
  --prompt "Hello, who are you?" \
  --trust-remote-code

With exo (distributed inference)

This model runs on exo with JACCL/RDMA tensor parallelism across multiple Apple Silicon nodes. Tested on a 2-node Mac Studio M3 Ultra cluster (512 GB + 512 GB unified memory) at ~21 tok/s.

Recommended sampling parameters

temperature: 1.0
top_p: 0.95
min_p: 0.01

Hardware Requirements

This is a large MoE model. You will need significant unified memory to run it:

Distributed: Two Apple Silicon machines with 512+ GB each, connected via Thunderbolt

License

This model inherits the Kimi K2 Community License from the base model.

Downloads last month: 2,551

Safetensors

Model size

1T params

Tensor type

BF16

F32

U32

MLX

Hardware compatibility

4-bit

Model tree for aidiffuser/Kimi-K2.6-MLX-4bit

Base model

moonshotai/Kimi-K2.6

Quantized

(30)

this model