YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Kimi-K2.6 Vision Weights
Vision-only weights extracted from moonshotai/Kimi-K2.6 for use with MLX-based inference.
Contents
kimi_k26_vision.safetensorsโ 335 tensors, ~899MB (BF16)vision_tower.*โ 329 tensors (MoonViT encoder, 27 layers)mm_projector.*โ 6 tensors (PatchMergerMLP projector)
config.jsonโ vision config + projector metadata
Architecture
| Component | Details |
|---|---|
| Vision Encoder | MoonViT: 27 layers, 1152 hidden, 16 heads, patch_size=14 |
| Patch Merger | 2x2 spatial merge + temporal pool (no learned params) |
| Projector | LayerNorm(1152) to Linear(4608 to 4608) to GELU to Linear(4608 to 7168) |
| Total params | ~450M |
Kimi-K2.6 uses the same vision architecture as Kimi-K2.5 (and the same vision encoder as Kimi-VL-A3B). The projector output dimension is 7168 to match the K2.6 text backbone hidden size.
Usage
These weights are designed to be loaded alongside text-only MLX ports of Kimi-K2.6 (e.g. mlx-community/Kimi-K2.6-mlx-DQ3_K_M-q8) to enable vision-language capabilities.
The vision encoder processes images into (N, 7168) embedding vectors that replace media placeholder tokens in the text embedding stream.
Reproduction
Extracted from shards 63 + 64 of moonshotai/Kimi-K2.6. The vision tensors live entirely in those two shards (mm_projector in 63, vision_tower in 64). No modifications to the weights; original BF16 precision preserved.
See extract_vision_weights.py for the script.
License
Same license as the source model: Kimi-K2.6 License
- Downloads last month
- 1,394