Kimi-K2.7-Code-JANGTQ_K

JANGTQ (JANG TurboQuant) quantization of moonshotai/Kimi-K2.7-Code for MLX on Apple silicon. TurboQuant applies a random-sign Hadamard rotation, a per-row FP16 norm, and a per-layer Lloyd-Max codebook to the routed experts, keeping the backbone at higher precision.

Profile: JANGTQ_K

Component Precision
Routed experts gate 2-bit, up 2-bit, down 4-bit
Attention 8-bit
Shared experts 8-bit
Dense MLP 8-bit
Embeddings 8-bit
LM head 8-bit
Norms / router FP16

Requirements

  • Load with the jang-tools Python package or vMLX. Not supported by stock MLX, LM Studio, or Ollama.
  • Kimi's tokenizer uses tiktoken and custom code: pip install tiktoken blobfile and set TRANSFORMERS_TRUST_REMOTE_CODE=1.

Usage

import os
os.environ["TRANSFORMERS_TRUST_REMOTE_CODE"] = "1"

from jang_tools.load_jangtq import load_jangtq_model as load
from mlx_lm import generate

model, tokenizer = load("bearzi/Kimi-K2.7-Code-JANGTQ_K")
msgs = [{"role": "user", "content": "Write a Python function that reverses a string."}]
prompt = tokenizer.apply_chat_template(msgs, add_generation_prompt=True, tokenize=False)
print(generate(model, tokenizer, prompt=prompt, max_tokens=512, verbose=True))

License

Inherited from the base model moonshotai/Kimi-K2.7-Code; quantization does not change the upstream terms. Attribution is required only for very large commercial deployments (see the license link above).

Downloads last month
1,690
Safetensors
Model size
89B params
Tensor type
U32
·
F16
·
U8
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bearzi/Kimi-K2.7-Code-JANGTQ_K

Finetuned
(6)
this model

Collection including bearzi/Kimi-K2.7-Code-JANGTQ_K