Instructions to use sombra-x/Kimi-K2.6-REAP-Solidity with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sombra-x/Kimi-K2.6-REAP-Solidity with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("sombra-x/Kimi-K2.6-REAP-Solidity")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

How to use sombra-x/Kimi-K2.6-REAP-Solidity with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "sombra-x/Kimi-K2.6-REAP-Solidity"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "sombra-x/Kimi-K2.6-REAP-Solidity"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use sombra-x/Kimi-K2.6-REAP-Solidity with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "sombra-x/Kimi-K2.6-REAP-Solidity"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default sombra-x/Kimi-K2.6-REAP-Solidity

Run Hermes

hermes

OpenClaw new

How to use sombra-x/Kimi-K2.6-REAP-Solidity with OpenClaw:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "sombra-x/Kimi-K2.6-REAP-Solidity"

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "sombra-x/Kimi-K2.6-REAP-Solidity" \
  --custom-provider-id mlx-lm \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

MLX LM

How to use sombra-x/Kimi-K2.6-REAP-Solidity with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "sombra-x/Kimi-K2.6-REAP-Solidity"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "sombra-x/Kimi-K2.6-REAP-Solidity"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "sombra-x/Kimi-K2.6-REAP-Solidity",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Kimi-K2.6 REAP Solidity - Q4 MLX

This is a proof-of-concept Solidity-specialized Kimi-K2.6 checkpoint for local MLX inference on Apple Silicon. Kimi-K2.6 is released by Moonshot AI in a native INT4 / compressed-tensors format; this checkpoint preserves that Q4-class release format in MLX affine Q4 while applying REAP-style Mixture-of-Experts pruning. The pruning keeps the routed experts most active for the target workload, removes the least relevant routed experts, and preserves the rest of the model structure.

The goal is practical local use: retain useful smart-contract coding, investigation, and fork-test behavior while reducing the expert footprint enough to run on a single high-memory Mac Studio.

This release is intentionally framed against Moonshot AI hosted Kimi-K2.6 inference, not just against smaller local baselines. Kimi-K2.6 is a tier-1 open-weight MoE model, and its native INT4 format is part of Moonshot's post-training quantization-aware release process, not an aftermarket quality-reducing quantization. This checkpoint shows that a domain-specialized REAP variant can run locally on a Mac Studio M3 Ultra 512 GB and complete demanding Solidity-agent tasks in the same target domain.

Built with mlx_fun, an MLX-native toolkit for MoE quantization, expert pruning, and expert-activation analysis.

REAP summed-rank heatmap for 60 MoE layers and 384 routed experts

Each row is one of the 60 MoE layers; each column is one of the 384 routed experts in the base model. Lower summed rank indicates higher observed importance for the calibration workload. The released checkpoint keeps 256 of 384 routed experts per MoE layer.

Model Details

Field	Value
Base model	`moonshotai/Kimi-K2.6`
Format	MLX safetensors with custom Kimi K2.5/K2.6 code files
Quantization	MLX affine Q4, preserving the native INT4/Q4-class Kimi-K2.6 release format
Pruning method	REAP per-layer expert pruning by saliency
Source routed experts	384 per MoE layer
Experts kept	256 / 384 routed experts per MoE layer
Expert reduction	128 experts removed per MoE layer, ~33% reduction
Layers	61 total text layers, including 1 dense layer and 60 MoE layers
Selected experts per token	8
Context length	262,144 tokens
Disk size	~401 GB
Shards	180 safetensors files
Build and evaluation machine	Mac Studio M3 Ultra, 512 GB unified memory
Recommended hardware	Apple Silicon with >= ~440 GB unified memory

The number of experts kept is uniform across MoE layers, but the specific expert indices differ per layer. The router and related expert-indexed tensors are sliced to match the retained experts.

Intended Use

This checkpoint is intended for local, domain-focused Solidity work:

Smart-contract review assistance.
Solidity exploit investigation and vulnerability triage.
Foundry-style fork-test generation and debugging.
Research into MoE routing, expert saliency, and workload-specific pruning.

It is not intended to be a general-purpose replacement for the full Kimi-K2.6 model. Capability outside the Solidity and smart-contract engineering domain is expected to degrade more than with the unpruned base model.

Build Summary

The checkpoint was produced from Kimi-K2.6 by:

Collecting routed-expert saliency from a focused Solidity-oriented workload.
Converting the native INT4/Q4-class base release to MLX affine Q4.
Removing the least-salient 128 routed experts from each MoE layer.
Preserving dense layers, top-k routing shape, shared experts, and router/tensor alignment.

All conversion, saliency collection, pruning, and local evaluation work for this release was completed on a Mac Studio M3 Ultra with 512 GB unified memory. That hardware point matters: the result is a local, reproducible workstation workflow for a tier-1 open-weight MoE model, not a cloud-only experiment.

Private prompts, internal task traces, target contracts, and operational pipeline details are intentionally omitted from this public model card. To apply the same method to another workload, collect your own routed-expert activation statistics with mlx_fun, then prune the target checkpoint using the resulting per-layer saliency.

Evaluation Snapshot

The following results are from a representative end-to-end smart-contract audit workload with multiple agent stages. They compare this pruned local Q4 checkpoint against a same-base local Q3 checkpoint and OpenRouter-routed Moonshot AI Kimi-K2.6 inference.

These are workload snapshots, not a formal benchmark suite, provider benchmark, or security certification. Hardware for the local runs was a Mac Studio M3 Ultra with 512 GB unified memory. The target details and internal pipeline names are intentionally abstracted in this public card.

Exploit-Investigation Stage

This stage asks the agent to reason through the exploit path and emit the confirmed vulnerability finding. The fully cold REAP row has no external knowledge priming.

Configuration	Outcome	Wall time	Input tokens	Output tokens	Total tokens	Tool calls
OpenRouter-routed Moonshot AI Kimi-K2.6	Critical canonical	293s	293,191	7,799	300,990	19
Local Q3 baseline	Critical canonical	1,665s	659,769	13,721	673,490	21
This model, local Q4 REAP, fully cold	Confirmed critical	1,498s	267,829	9,613	277,442	17

On this exploit-investigation stage, REAP uses ~59% fewer total tokens and ~19% fewer tool calls than the local Q3 baseline while reaching the same canonical outcome. Compared with OpenRouter-routed Moonshot AI inference, REAP uses slightly fewer total tokens and fewer tool calls, but wall-clock is slower because local Apple Silicon prefill/generation throughput is lower.

Throughput Snapshot

Configuration	Output throughput	Prefill throughput
This model, local Q4 REAP	~10-13 tok/s	~300 tok/s
Moonshot AI hosted Kimi-K2.6 inference	~23-31 tok/s	~2,000 tok/s

Moonshot AI hosted inference is faster wall-clock. The point of REAP here is different: keep a tier-1 open-weight model useful for a high-value domain while running locally, privately, and repeatably on one Mac Studio.

Quickstart

LM Studio

Place the directory under:

~/.lmstudio/models/<owner>/Kimi-K2.6-REAP-Solidity

Enable Trust Remote Code in LM Studio. The checkpoint ships the required Moonshot Kimi custom modeling files.

`mlx-fun serve`

mlx-fun serve \
  --model /path/to/Kimi-K2.6-REAP-Solidity \
  --trust-remote-code \
  --idle-timeout 7200 \
  --default-top-k 100 \
  --default-repetition-penalty 1.05 \
  --prompt-cache-size 1

The included chat_template.jinja handles Kimi tool calls and reasoning blocks.

Limitations

This is not a fine-tune. The model weights were not trained further; routed experts were removed based on observed workload saliency.
This is not a security-certified auditor. It can miss vulnerabilities, invent false positives, or generate unsafe assumptions. Human review is required for smart-contract findings.
The evaluation above is a small workload snapshot, not a broad benchmark corpus. Test on your own contracts, vulnerability classes, and agent setup before relying on the model.
The Q4 wording here refers to preserving Kimi-K2.6's native INT4/Q4-class release format in MLX. It should not be read as an additional aftermarket low-bit compression step from a BF16 checkpoint.
Saliency was collected from a lower-bit capture checkpoint and then applied to the Q4 target. Routing saliency is generally stable across quantization levels, but stricter reproductions should collect saliency directly on their target quantization.
Tooling that assumes expert index N has the same meaning across all layers will not apply cleanly. Expert retention is layer-specific.
Kimi-K2.6 is multimodal, but only the text path was exercised during calibration and evaluation. Vision inputs have not been measured after pruning.

License

This checkpoint inherits the base model's Modified MIT terms. See moonshotai/Kimi-K2.6 and the included LICENSE file for the original terms.

Model tree for sombra-x/Kimi-K2.6-REAP-Solidity

Base model

moonshotai/Kimi-K2.6

Quantized

(42)

this model

sombra-x
/

Kimi-K2.6-REAP-Solidity

Kimi-K2.6 REAP Solidity - Q4 MLX

Model Details

Intended Use

Build Summary

Evaluation Snapshot

Exploit-Investigation Stage

Throughput Snapshot

Quickstart

LM Studio

`mlx-fun serve`

Limitations

License

Links

Model tree for sombra-x/Kimi-K2.6-REAP-Solidity