kimi-k2-instruct-vindex

Per-expert gate-vector vindex for moonshotai/Kimi-K2-Instruct, built by the Divinci-AI team for use with LarQL (Chris Hay) and adjacent feature-routing inference research.

Vindex specs

  • Source model: moonshotai/Kimi-K2-Instruct
  • Architecture: kimi_k2 (61 layers, 7168 hidden, 2048 moe_intermediate)
  • Experts: 384 routed + 1 shared, 8 per token
  • Layers indexed: 60 MoE layers (L01-L60)
  • Features per expert: 64 (top-K right singular vectors of gate_proj)
  • Format: float32, mmap-friendly contiguous binary
  • Total size: 42.28 GB

What this is

  • gate_vectors.bin β€” flat float32 binary, layout [moe_layers, n_experts, num_feats, hidden_size]. Each per-expert chunk is the top-64 right singular vectors (Vt[:K, :]) of that expert's gate_proj weight after fp8/MXFP4 dequantization.
  • gate_vectors_index.json β€” sidecar with per-layer file_offset (bytes), shape, and SVD stats (median_var64, q25_var64, q75_var64). Lookup table for mmap.
  • phase1_moe_svd.json β€” full per-layer Phase 1 stats (routed/shared/router decomposition).
  • phase2_router_svd.json β€” router weight SVD per layer (top-K variance, effective rank, s0/s1 ratio).

What this is not

  • Not a runnable model (no inference path on its own).
  • Not raw weights β€” only top-K right singular vectors of gate_proj, with the singular values not retained. Reconstruction is lossy.
  • Not a fine-tune or quantization of the base model.

Usage

import numpy as np

# Memory-map the binary
arr = np.memmap("gate_vectors.bin", dtype=np.float32, mode="r")

import json
idx = json.load(open("gate_vectors_index.json"))
moe = idx["model_config"]["moe"]
n_experts = moe["n_routed_experts"]
n_feats = idx["num_feats"]
hidden = moe["hidden_size"]

# Get layer L's experts
def get_layer(L):
    meta = idx["layers"][str(L)]
    offset = meta["file_offset"] // 4  # bytes β†’ float32 elements
    n = n_experts * n_feats * hidden
    return arr[offset:offset+n].reshape(n_experts, n_feats, hidden)

V_L1 = get_layer(1)  # shape (n_experts, n_feats, hidden)
print("L1 expert 0 top vector L2 norm:", np.linalg.norm(V_L1[0, 0]))  # β‰ˆ 1.0

Citation

If you use this vindex in research, please cite:

@misc{divinci_kimi_k2_instruct_vindex_2026,
  title  = {kimi-k2-instruct-vindex: per-expert gate-vector vindex for moonshotai/Kimi-K2-Instruct},
  author = {Divinci-AI},
  year   = {2026},
  url    = {https://huggingface.co/Divinci-AI/kimi-k2-instruct-vindex},
}

Built using moe_vindex_builder.py.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Divinci-AI/kimi-k2-instruct-vindex

Finetuned
(18)
this model