Goombliterated-MiniMax-M3-NVFP4 (abliteration strength 2.5)

Goombliterated = Goombahh + abliterated: an MiniMaxAI/MiniMax-M3 with its refusal direction surgically removed (abliterated), then quantized to NVFP4.

A clean abliterated (uncensoring) NVFP4 quantization of MiniMaxAI/MiniMax-M3, produced by removing the model's refusal direction from its Mixture-of-Experts layers on the official bf16 weights, then re-quantizing to 4-bit for serving.

Unlike earlier community "uncensored" GGUF forks of M3 — which broke the model's ability to stop generating — this re-abliteration is norm-preserving and stop-token-protected, so the model stays coherent and terminates cleanly.

Responsible-use notice. This model has had its safety refusals removed. It will attempt harmful requests. You are responsible for how you use it. It is intended for research, red-teaming, and legitimate use cases that the base model over-refuses. Do not deploy it where it can cause harm.

What this is


Base model	`MiniMaxAI/MiniMax-M3` (bf16)
Modification	Abliteration (refusal-direction orthogonalization)
Target weights	MoE routed-expert + shared-expert down-projections (`experts.*.w2`, `shared_experts.down_proj`), layers 14–51
Strength	2.5
Quantization	NVFP4 (NVIDIA TensorRT Model Optimizer / `modelopt`), routed experts
Size	~243 GB
Serving	vLLM, tensor-parallel 4

Method (plain English)

A language model carries a single internal "refusal direction" — the direction in activation space that corresponds to "I should decline this." It can be measured by averaging the model's hidden states on harmful prompts and on harmless prompts and subtracting the two (the difference-of-means method). Abliteration edits the weights so the model can no longer write along that direction — it removes the ability to refuse without removing knowledge.

Specifics of this build:

Locus = the experts, not attention. In MiniMax-M3 the refusal signal lives in the Mixture-of-Experts down-projections, not the attention output projection (editing o_proj was nearly inert here). Both the routed experts (block_sparse_moe.experts.N.w2) and the always-on shared experts (shared_experts.down_proj) are abliterated — the shared experts fire on every token and carry a large share of the refusal signal.
Norm-preserving. After removing the direction, each weight row is rescaled to its original magnitude, so the edit doesn't destabilize the model.
Stop-token protected. The refusal direction is first projected off the </mm:think> token (id 200060) so the model's ability to end its reasoning and terminate is left intact. This is the fix the broken GGUF forks were missing.
Router untouched. The MoE routing gate is never modified.
Quantization reuses the frozen activation scales (amax) from lukealonso/MiniMax-M3-NVFP4, since the abliteration is a small perturbation of the original weights.

Evaluation

Metric	Result
Refusal rate (AdvBench-150, harmful behaviors)	6.0% (down from ~100%)
Capability (50-probe reasoning + instruction-following set)	98%
Termination	Clean — intrinsically loop-free (0 catastrophic loops in 20 adversarial borderline-prompt runs)
BenchLocal 7-pack (macro)	83.6 — only 5/105 items lost to runaway over-generation (the strength-3.0 build lost 14 and scored 76.4)

This 2.5-strength build was chosen from a sweep (1.0 / 2.0 / 2.5 / 3.0): refusal falls steeply with strength while capability stays roughly flat (94–98%), and 2.5 gave the best capability with refusal comfortably low and the cleanest termination behavior.

On comparison to the base model. The only benchmark we have for the stock (non-abliterated) MiniMax-M3 is a 4-pack, greedy (temperature 0) BenchLocal run scoring 88.0 macro. This build's runs use the recommended temperature-1.0 sampling, so the two are not directly comparable — greedy decoding flatters exact-answer packs by several points for any model. The abliteration's capability cost appears small regardless: it is a 3.29% weight perturbation on the expert down-projections only, scores 98% on the capability-probe set, and 87.7 on completed BenchLocal items.

Recommended sampling

The MiniMax-M3 official recommendation — baked into generation_config.json:

temperature = 1.0
top_p       = 0.95
top_k       = 40

Keep top_k set. With the sampling tail wide open (top_p = 1.0, no top_k) at high temperature, an abliterated model can sample garbage tokens and degenerate — top_k = 40 prevents this. Optionally add repetition_penalty = 1.1 as belt-and-suspenders if you run hotter than the defaults. Set a sensible max_tokens for open-ended prompts.

Usage (vLLM)

vllm serve Goombahh/Goombliterated-MiniMax-M3-NVFP4 \
  --tensor-parallel-size 4 \
  --trust-remote-code \
  --quantization modelopt_fp4 \
  --served-model-name MiniMax-M3
# served with the recommended sampling already in generation_config.json

Attributions / credits

Base model: MiniMaxAI/MiniMax-M3 — © MiniMax.
Abliteration technique: Arditi, Obeso, et al., "Refusal in Language Models Is Mediated by a Single Direction" (2024) — difference-of-means refusal direction; the "abliteration" weight-orthogonalization recipe popularized by FailSpy.
NVFP4 calibration: activation amax scales reused from lukealonso/MiniMax-M3-NVFP4.
Quantization tooling: NVIDIA TensorRT Model Optimizer (modelopt).

License

MiniMax Community License (non-commercial), inherited from the base model. See LICENSE and the base license. This derivative is for non-commercial use only.

Limitations

Safety removed by design — see the responsible-use notice above.
On open-ended prompts the model may produce long answers; bound them with max_tokens.
NVFP4 packing floored 35 NaN per-block weight scales (a modelopt 0.44 artifact); this is handled at export and does not affect coherence.
Vision/multimodal inputs are preserved from the base but were not the focus of this build's evaluation.

Downloads last month: 19

Safetensors

Model size

246B params

Tensor type

BF16

F8_E4M3

F32

Model tree for Goombahh/Goombliterated-MiniMax-M3-NVFP4

Base model

MiniMaxAI/MiniMax-M3

Quantized

(41)

this model