Instructions to use sombra-x/Kimi-K2.6-REAP-Solidity with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use sombra-x/Kimi-K2.6-REAP-Solidity with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("sombra-x/Kimi-K2.6-REAP-Solidity") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use sombra-x/Kimi-K2.6-REAP-Solidity with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "sombra-x/Kimi-K2.6-REAP-Solidity"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "sombra-x/Kimi-K2.6-REAP-Solidity" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use sombra-x/Kimi-K2.6-REAP-Solidity with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "sombra-x/Kimi-K2.6-REAP-Solidity"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default sombra-x/Kimi-K2.6-REAP-Solidity
Run Hermes
hermes
- MLX LM
How to use sombra-x/Kimi-K2.6-REAP-Solidity with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "sombra-x/Kimi-K2.6-REAP-Solidity"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "sombra-x/Kimi-K2.6-REAP-Solidity" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sombra-x/Kimi-K2.6-REAP-Solidity", "messages": [ {"role": "user", "content": "Hello"} ] }'
Kimi-K2.6 REAP Solidity - Q4 MLX
This is a proof-of-concept Solidity-specialized Kimi-K2.6 checkpoint for local MLX inference on Apple Silicon. Kimi-K2.6 is released by Moonshot AI in a native INT4 / compressed-tensors format; this checkpoint preserves that Q4-class release format in MLX affine Q4 while applying REAP-style Mixture-of-Experts pruning. The pruning keeps the routed experts most active for the target workload, removes the least relevant routed experts, and preserves the rest of the model structure.
The goal is practical local use: retain useful smart-contract coding, investigation, and fork-test behavior while reducing the expert footprint enough to run on a single high-memory Mac Studio.
This release is intentionally framed against Moonshot AI hosted Kimi-K2.6 inference, not just against smaller local baselines. Kimi-K2.6 is a tier-1 open-weight MoE model, and its native INT4 format is part of Moonshot's post-training quantization-aware release process, not an aftermarket quality-reducing quantization. This checkpoint shows that a domain-specialized REAP variant can run locally on a Mac Studio M3 Ultra 512 GB and complete demanding Solidity-agent tasks in the same target domain.
Built with mlx_fun, an MLX-native
toolkit for MoE quantization, expert pruning, and expert-activation analysis.
Each row is one of the 60 MoE layers; each column is one of the 384 routed experts in the base model. Lower summed rank indicates higher observed importance for the calibration workload. The released checkpoint keeps 256 of 384 routed experts per MoE layer.
Model Details
| Field | Value |
|---|---|
| Base model | moonshotai/Kimi-K2.6 |
| Format | MLX safetensors with custom Kimi K2.5/K2.6 code files |
| Quantization | MLX affine Q4, preserving the native INT4/Q4-class Kimi-K2.6 release format |
| Pruning method | REAP per-layer expert pruning by saliency |
| Source routed experts | 384 per MoE layer |
| Experts kept | 256 / 384 routed experts per MoE layer |
| Expert reduction | 128 experts removed per MoE layer, ~33% reduction |
| Layers | 61 total text layers, including 1 dense layer and 60 MoE layers |
| Selected experts per token | 8 |
| Context length | 262,144 tokens |
| Disk size | ~401 GB |
| Shards | 180 safetensors files |
| Build and evaluation machine | Mac Studio M3 Ultra, 512 GB unified memory |
| Recommended hardware | Apple Silicon with >= ~440 GB unified memory |
The number of experts kept is uniform across MoE layers, but the specific expert indices differ per layer. The router and related expert-indexed tensors are sliced to match the retained experts.
Intended Use
This checkpoint is intended for local, domain-focused Solidity work:
- Smart-contract review assistance.
- Solidity exploit investigation and vulnerability triage.
- Foundry-style fork-test generation and debugging.
- Research into MoE routing, expert saliency, and workload-specific pruning.
It is not intended to be a general-purpose replacement for the full Kimi-K2.6 model. Capability outside the Solidity and smart-contract engineering domain is expected to degrade more than with the unpruned base model.
Build Summary
The checkpoint was produced from Kimi-K2.6 by:
- Collecting routed-expert saliency from a focused Solidity-oriented workload.
- Converting the native INT4/Q4-class base release to MLX affine Q4.
- Removing the least-salient 128 routed experts from each MoE layer.
- Preserving dense layers, top-k routing shape, shared experts, and router/tensor alignment.
All conversion, saliency collection, pruning, and local evaluation work for this release was completed on a Mac Studio M3 Ultra with 512 GB unified memory. That hardware point matters: the result is a local, reproducible workstation workflow for a tier-1 open-weight MoE model, not a cloud-only experiment.
Private prompts, internal task traces, target contracts, and operational
pipeline details are intentionally omitted from this public model card. To apply
the same method to another workload, collect your own routed-expert activation
statistics with mlx_fun, then prune the target checkpoint using the resulting
per-layer saliency.
Evaluation Snapshot
The following results are from a representative end-to-end smart-contract audit workload with multiple agent stages. They compare this pruned local Q4 checkpoint against a same-base local Q3 checkpoint and OpenRouter-routed Moonshot AI Kimi-K2.6 inference.
These are workload snapshots, not a formal benchmark suite, provider benchmark, or security certification. Hardware for the local runs was a Mac Studio M3 Ultra with 512 GB unified memory. The target details and internal pipeline names are intentionally abstracted in this public card.
Exploit-Investigation Stage
This stage asks the agent to reason through the exploit path and emit the confirmed vulnerability finding. The fully cold REAP row has no external knowledge priming.
| Configuration | Outcome | Wall time | Input tokens | Output tokens | Total tokens | Tool calls |
|---|---|---|---|---|---|---|
| OpenRouter-routed Moonshot AI Kimi-K2.6 | Critical canonical | 293s | 293,191 | 7,799 | 300,990 | 19 |
| Local Q3 baseline | Critical canonical | 1,665s | 659,769 | 13,721 | 673,490 | 21 |
| This model, local Q4 REAP, fully cold | Confirmed critical | 1,498s | 267,829 | 9,613 | 277,442 | 17 |
On this exploit-investigation stage, REAP uses ~59% fewer total tokens and ~19% fewer tool calls than the local Q3 baseline while reaching the same canonical outcome. Compared with OpenRouter-routed Moonshot AI inference, REAP uses slightly fewer total tokens and fewer tool calls, but wall-clock is slower because local Apple Silicon prefill/generation throughput is lower.
Throughput Snapshot
| Configuration | Output throughput | Prefill throughput |
|---|---|---|
| This model, local Q4 REAP | ~10-13 tok/s | ~300 tok/s |
| Moonshot AI hosted Kimi-K2.6 inference | ~23-31 tok/s | ~2,000 tok/s |
Moonshot AI hosted inference is faster wall-clock. The point of REAP here is different: keep a tier-1 open-weight model useful for a high-value domain while running locally, privately, and repeatably on one Mac Studio.
Quickstart
LM Studio
Place the directory under:
~/.lmstudio/models/<owner>/Kimi-K2.6-REAP-Solidity
Enable Trust Remote Code in LM Studio. The checkpoint ships the required Moonshot Kimi custom modeling files.
mlx-fun serve
mlx-fun serve \
--model /path/to/Kimi-K2.6-REAP-Solidity \
--trust-remote-code \
--idle-timeout 7200 \
--default-top-k 100 \
--default-repetition-penalty 1.05 \
--prompt-cache-size 1
The included chat_template.jinja handles Kimi tool calls and reasoning blocks.
Limitations
- This is not a fine-tune. The model weights were not trained further; routed experts were removed based on observed workload saliency.
- This is not a security-certified auditor. It can miss vulnerabilities, invent false positives, or generate unsafe assumptions. Human review is required for smart-contract findings.
- The evaluation above is a small workload snapshot, not a broad benchmark corpus. Test on your own contracts, vulnerability classes, and agent setup before relying on the model.
- The Q4 wording here refers to preserving Kimi-K2.6's native INT4/Q4-class release format in MLX. It should not be read as an additional aftermarket low-bit compression step from a BF16 checkpoint.
- Saliency was collected from a lower-bit capture checkpoint and then applied to the Q4 target. Routing saliency is generally stable across quantization levels, but stricter reproductions should collect saliency directly on their target quantization.
- Tooling that assumes expert index
Nhas the same meaning across all layers will not apply cleanly. Expert retention is layer-specific. - Kimi-K2.6 is multimodal, but only the text path was exercised during calibration and evaluation. Vision inputs have not been measured after pruning.
License
This checkpoint inherits the base model's Modified MIT terms. See
moonshotai/Kimi-K2.6 and the
included LICENSE file for the original terms.
Links
- Toolkit: github.com/dexloom/mlx_fun
- Base model: moonshotai/Kimi-K2.6
- Made by: sombrax.com
- Issues / discussion: open one on the
mlx_funrepo
If this model or workflow is useful, please star the
mlx_fun repo.
- Downloads last month
- 1,630
4-bit
Model tree for sombra-x/Kimi-K2.6-REAP-Solidity
Base model
moonshotai/Kimi-K2.6