Ornith-1.0-35B — W8A16 AutoRound (INT8 weight-only)

This is an unofficial W8A16 quantized version of deepreinforce-ai/Ornith-1.0-35B, created with AutoRound.

Ornith-1.0-35B is DeepReinforce AI's lightweight agentic-coding model.

Quantizing the routed experts and attention projections to INT8 shrinks the checkpoint from ~70 GB (BF16) to ~38 GB, so the model fits on 2×24 GB GPUs while keeping the output distribution close to the original (see fidelity below).

What is quantized

INT8 (per-output-channel, symmetric) is applied to the routed-expert MLPs (gate_up_proj, down_proj) and the full-attention projections. The following are kept at BF16:

embed_tokens, lm_head, the MoE router (mlp.gate), the shared expert (shared_expert), the linear-attention / gated-delta mixers (linear_attn), and the entire vision tower (visual).

In total ~30,760 / 31,181 linear modules are quantized. the rest stay BF16.

Quantization details

Field Value
Base model deepreinforce-ai/Ornith-1.0-35B
Method AutoRound (intel/auto-round)
Scheme W8A16
Bits 8
Group size -1 (per-output-channel)
Symmetric yes
Format auto_round (gptq-style packing)
Unquantized layers embed_tokens, lm_head, mlp.gate, shared_expert, linear_attn, visual
Calibration data 25 % NeelNanda/pile-10k + 75 % codeparrot/github-code-clean
Calibration samples 1024 (256 pile + 768 github-code)
Iterations 1000
Batch size 8
Sequence length 2048
GPU used for quant 2× RTX 3090

KLD details

Quality was verified by measuring the KL divergence of the next-token distribution against the original BF16 model, KL(P_bf16 ‖ Q_int8), over 131,072 tokens (128 passages × 1024 tokens from NeelNanda/pile-10k, held out from calibration). Lower is better.

Metric Value
Mean KL 0.00348 nats
Median KL 0.00139 nats
99th-percentile KL 0.0321 nats
Reverse KL KL(Q‖P) 0.00354 nats
Top-1 agreement 97.5 %

How to use

  • vLLM is recommended.

Acknowledgements

Downloads last month
409
Safetensors
Model size
11B params
Tensor type
I32
·
BF16
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Minachist/Ornith-1.0-35B-INT8-AutoRound

Quantized
(110)
this model