Goombliterated-MiniMax-M3-NVFP4 (abliteration strength 2.5)
Goombliterated = Goombahh + abliterated: an MiniMaxAI/MiniMax-M3 with its refusal direction surgically removed (abliterated), then quantized to NVFP4.
A clean abliterated (uncensoring) NVFP4 quantization of MiniMaxAI/MiniMax-M3, produced by removing the model's refusal direction from its Mixture-of-Experts layers on the official bf16 weights, then re-quantizing to 4-bit for serving.
Unlike earlier community "uncensored" GGUF forks of M3 — which broke the model's ability to stop generating — this re-abliteration is norm-preserving and stop-token-protected, so the model stays coherent and terminates cleanly.
Responsible-use notice. This model has had its safety refusals removed. It will attempt harmful requests. You are responsible for how you use it. It is intended for research, red-teaming, and legitimate use cases that the base model over-refuses. Do not deploy it where it can cause harm.
What this is
| Base model | MiniMaxAI/MiniMax-M3 (bf16) |
| Modification | Abliteration (refusal-direction orthogonalization) |
| Target weights | MoE routed-expert + shared-expert down-projections (experts.*.w2, shared_experts.down_proj), layers 14–51 |
| Strength | 2.5 |
| Quantization | NVFP4 (NVIDIA TensorRT Model Optimizer / modelopt), routed experts |
| Size | ~243 GB |
| Serving | vLLM, tensor-parallel 4 |
Method (plain English)
A language model carries a single internal "refusal direction" — the direction in activation space that corresponds to "I should decline this." It can be measured by averaging the model's hidden states on harmful prompts and on harmless prompts and subtracting the two (the difference-of-means method). Abliteration edits the weights so the model can no longer write along that direction — it removes the ability to refuse without removing knowledge.
Specifics of this build:
- Locus = the experts, not attention. In MiniMax-M3 the refusal signal lives
in the Mixture-of-Experts down-projections, not the attention output
projection (editing
o_projwas nearly inert here). Both the routed experts (block_sparse_moe.experts.N.w2) and the always-on shared experts (shared_experts.down_proj) are abliterated — the shared experts fire on every token and carry a large share of the refusal signal. - Norm-preserving. After removing the direction, each weight row is rescaled to its original magnitude, so the edit doesn't destabilize the model.
- Stop-token protected. The refusal direction is first projected off the
</mm:think>token (id 200060) so the model's ability to end its reasoning and terminate is left intact. This is the fix the broken GGUF forks were missing. - Router untouched. The MoE routing gate is never modified.
- Quantization reuses the frozen activation scales (
amax) fromlukealonso/MiniMax-M3-NVFP4, since the abliteration is a small perturbation of the original weights.
Evaluation
| Metric | Result |
|---|---|
| Refusal rate (AdvBench-150, harmful behaviors) | 6.0% (down from ~100%) |
| Capability (50-probe reasoning + instruction-following set) | 98% |
| Termination | Clean — intrinsically loop-free (0 catastrophic loops in 20 adversarial borderline-prompt runs) |
| BenchLocal 7-pack (macro) | 83.6 — only 5/105 items lost to runaway over-generation (the strength-3.0 build lost 14 and scored 76.4) |
This 2.5-strength build was chosen from a sweep (1.0 / 2.0 / 2.5 / 3.0): refusal falls steeply with strength while capability stays roughly flat (94–98%), and 2.5 gave the best capability with refusal comfortably low and the cleanest termination behavior.
On comparison to the base model. The only benchmark we have for the stock (non-abliterated) MiniMax-M3 is a 4-pack, greedy (temperature 0) BenchLocal run scoring 88.0 macro. This build's runs use the recommended temperature-1.0 sampling, so the two are not directly comparable — greedy decoding flatters exact-answer packs by several points for any model. The abliteration's capability cost appears small regardless: it is a 3.29% weight perturbation on the expert down-projections only, scores 98% on the capability-probe set, and 87.7 on completed BenchLocal items.
Recommended sampling
The MiniMax-M3 official recommendation — baked into generation_config.json:
temperature = 1.0
top_p = 0.95
top_k = 40
Keep top_k set. With the sampling tail wide open (top_p = 1.0, no
top_k) at high temperature, an abliterated model can sample garbage tokens and
degenerate — top_k = 40 prevents this. Optionally add repetition_penalty = 1.1
as belt-and-suspenders if you run hotter than the defaults. Set a sensible
max_tokens for open-ended prompts.
Usage (vLLM)
vllm serve Goombahh/Goombliterated-MiniMax-M3-NVFP4 \
--tensor-parallel-size 4 \
--trust-remote-code \
--quantization modelopt_fp4 \
--served-model-name MiniMax-M3
# served with the recommended sampling already in generation_config.json
Attributions / credits
- Base model:
MiniMaxAI/MiniMax-M3— © MiniMax. - Abliteration technique: Arditi, Obeso, et al., "Refusal in Language Models Is Mediated by a Single Direction" (2024) — difference-of-means refusal direction; the "abliteration" weight-orthogonalization recipe popularized by FailSpy.
- NVFP4 calibration: activation
amaxscales reused fromlukealonso/MiniMax-M3-NVFP4. - Quantization tooling: NVIDIA TensorRT Model Optimizer (
modelopt).
License
MiniMax Community License (non-commercial), inherited from the base model.
See LICENSE and the
base license.
This derivative is for non-commercial use only.
Limitations
- Safety removed by design — see the responsible-use notice above.
- On open-ended prompts the model may produce long answers; bound them with
max_tokens. - NVFP4 packing floored 35 NaN per-block weight scales (a
modelopt0.44 artifact); this is handled at export and does not affect coherence. - Vision/multimodal inputs are preserved from the base but were not the focus of this build's evaluation.
- Downloads last month
- 19
Model tree for Goombahh/Goombliterated-MiniMax-M3-NVFP4
Base model
MiniMaxAI/MiniMax-M3