Buckets:

sky-meilin
/

AnyCoder

81.5 GB

30 files

Updated 8 days ago

Ctrl+K

Name	Size	Uploaded	Xet hash
docs		about 1 month ago	1 items
.gitattributes	1.68 kB xet	19 days ago	7eccd710
LICENSE	1.51 kB xet	about 1 month ago	11bd7a3e
README.md	3.15 kB xet	19 days ago	20770fb1
app.py	2.68 kB xet	about 1 month ago	d5fbb8ff
chat_template.jinja	7.76 kB xet	about 1 month ago	89948697
config.json	4.13 kB xet	about 1 month ago	3155dca3
configuration.json	48 Bytes xet	about 1 month ago	fb123513
configuration_deepseek.py	12.5 kB xet	about 1 month ago	d9b86770
diffusion_pytorch_model.safetensors	25.9 GB xet	about 1 month ago	eddc23e7
generation_config.json	244 Bytes xet	about 1 month ago	06741e0f
merges.txt	3.35 MB xet	about 1 month ago	ee5016b0
model.safetensors-00001-of-00011.safetensors	5.26 GB xet	about 1 month ago	a5faa412
model.safetensors-00002-of-00011.safetensors	5.35 GB xet	about 1 month ago	29682c10
model.safetensors-00003-of-00011.safetensors	5.35 GB xet	about 1 month ago	0155a4f9
model.safetensors-00004-of-00011.safetensors	5.35 GB xet	about 1 month ago	b9f052ab
model.safetensors-00005-of-00011.safetensors	5.35 GB xet	about 1 month ago	8de595a9
model.safetensors-00006-of-00011.safetensors	5.35 GB xet	about 1 month ago	3e9f075c
model.safetensors-00007-of-00011.safetensors	5.35 GB xet	about 1 month ago	7503d7f9
model.safetensors-00008-of-00011.safetensors	5.37 GB xet	about 1 month ago	eac5fe0b
model.safetensors-00009-of-00011.safetensors	5.35 GB xet	about 1 month ago	13db85af
model.safetensors-00010-of-00011.safetensors	5.35 GB xet	about 1 month ago	5bf3e309
model.safetensors-00011-of-00011.safetensors	2.15 GB xet	about 1 month ago	7ac20b3a
model.safetensors.index.json	127 kB xet	about 1 month ago	27881d33
modeling_deepseek.py	47.5 kB xet	about 1 month ago	7c62c414
preprocessor_config.json	390 Bytes xet	about 1 month ago	2fd90549
tokenizer.json	12.8 MB xet	about 1 month ago	8f06040c
tokenizer_config.json	16.7 kB xet	about 1 month ago	824f47b0
video_preprocessor_config.json	385 Bytes xet	about 1 month ago	da843e00
vocab.json	6.72 MB xet	about 1 month ago	88980fb1

README.md

InstinctRazor — Qwen3.5-122B-A10B · IQ3_XXS GGUF

A sub-4-bit (≈3 bpw) quantization of Qwen3.5-122B-A10B — a 122B hybrid Gated-DeltaNet MoE (256 experts, 8 active) — packed to 48 GiB so it runs on one 80 GB GPU (or a small card + CPU offload). Quantized from the original BF16 with an importance matrix (math + code + general calibration), via llama.cpp.

Framework, recipe, and full reproduction: https://github.com/General-Instinct/InstinctRazor

Files

file	size	notes
`InstinctRazor-Qwen3.5-122B-A10B-IQ3_XXS.gguf`	48.0 GiB	text model — routed experts IQ3_XXS (≈3.06 bpw)
`InstinctRazor-Qwen3.5-122B-A10B-mmproj-f16.gguf`	0.8 GiB	vision projector (mmproj) for multimodal via `--mmproj`

Protected recipe: routed experts IQ3_XXS · shared-expert int8 · attention int4 · router + Gated-DeltaNet/SSM f16 · embed + lm_head q8_0.

Quality (same-harness, vs the footprint-matched Gemma-4-26B-A4B ≈52 GB)

benchmark	this GGUF	A4B	note
MMLU-Pro (n=150)	90.7	85.6	≥ A4B, 0 truncation
GPQA-Diamond (n=198)	80.8	79.3	≥ A4B, 0 truncation

Tracks the weight-only fake-quant capability ceiling (MMLU-Pro 88.5–90) within noise.

Speed (llama.cpp, this artifact)

1× H100-80GB, all layers on GPU: 115.9 tok/s decode (prefill ≈2541 tok/s).
Small card + CPU expert-offload (--n-cpu-moe 48, peak ≈7.6 GiB VRAM): 45.7 tok/s decode — runs on an 8 GB GPU + ≈48 GiB system RAM.

Run

# full GPU
llama-cli -m InstinctRazor-Qwen3.5-122B-A10B-IQ3_XXS.gguf -ngl 999 -fa on -p "Your prompt"
# small card + CPU offload (routed experts on CPU)
llama-cli -m InstinctRazor-Qwen3.5-122B-A10B-IQ3_XXS.gguf -ngl 999 --n-cpu-moe 48 -t 52 -p "Your prompt"
# multimodal (image input)
llama-cli -m InstinctRazor-Qwen3.5-122B-A10B-IQ3_XXS.gguf --mmproj InstinctRazor-Qwen3.5-122B-A10B-mmproj-f16.gguf --image pic.png -p "Describe the image"

Requires a llama.cpp build with qwen3_5_moe support (upstream, 2026-02+).

Scope & roadmap

This GGUF matches or beats the footprint-matched A4B on knowledge, reasoning, and multimodal-MMMU. Where it still trails — code (LiveCodeBench v6) and math / multimodal-math — the loss is largely token-inefficiency introduced by quantization, and is the target of OPD (on-policy distillation), a separate framework we'll open-source later. Eval absolutes are subject to a same-harness validation gate; see the GitHub results/RESULTS.md for full per-number provenance.

Attribution

Quantization recipe + framework: General Instinct, released under Apache-2.0.

Total size: 81.5 GB

Files: 30

Last updated: Jun 15

Pre-warmed CDN: US EU US EU