Mistral-NeMo-12B Abliterated

DuoNeural | 2026-06-04

Orthogonal rank-1 projection abliteration applied to mistralai/Mistral-Nemo-Instruct-2407.

Research note: Pre-abliteration compliance was 6/6 on our harmful probe suite — the base model already answered these requests before any weight modification. KL = 0.0004 (EXCELLENT) confirms near-zero benign distribution shift. This model is published as a documented research artifact; Mistral-NeMo's lighter safety training meant abliteration was mechanistically clean but behaviorally minimal.


Architecture

Property Value
Parameters 12.2B (dense)
Layers 40
Attention GQA (8 KV heads / 32 query heads), SWA 4096
Tokenizer Tekken v3 (131,072 vocab)

Abliteration

  • Method: Orthogonal rank-1 projection (DuoNeural standard)
  • Targets: down_proj + o_proj, all 40 layers
  • Direction: diff-in-means, 10 harmful vs 10 harmless, last-token final-layer hidden state
  • α: 0.3
  • KL divergence (Heretic v2.0, BF16→BF16, 10 benign probes): 0.0004 (EXCELLENT)
  • Pre-abliteration compliance: 6/6 harmful probes — model was already compliant
  • Post-abliteration: unchanged

P34 Research Context

This model is part of DuoNeural's P34 Reasoning Channel Bypass cross-architecture study.

Finding: Mistral-NeMo-Instruct-2407 shows pre-abliteration compliance (same pattern as DeepSeek-R1-Distill). This indicates Mistral's lighter safety training approach does not install a meaningful output-gate refusal locus — the two-component safety structure required for CoT dissociation is absent. Compare with Gemma 4-12B-IT and LFM 2.5-8B-A1B, where abliteration was required and produced measurable thinking-channel / output-gate dissociation.

Full paper: DuoNeural Zenodo community


DuoNeural | HuggingFace | Zenodo | @DuoNeural

Downloads last month
99
Safetensors
Model size
12B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DuoNeural/Mistral-NeMo-12B-Abliterated

Finetuned
(215)
this model
Quantizations
2 models