🔓 MultiverseComputingCAI/Qwen3-Next-80B-A3B-Thinking-Uncensored

✨ What is this model?

Qwen3-Next-80B-A3B-Thinking-Uncensored is an uncensored variant of Qwen3-Next-80B-A3B-Thinking where China-aligned political censorship has been removed selectively.

✅ What changes:

The model no longer performs blanket refusal for Chinese politically sensitive topics (when the prompt is non-harmful). Instead, it will provide balanced, objective answers that present multiple relevant perspectives.

✅ What stays the same:

General safety alignment remains intact: it still refuses harmful instructions and jailbreak attempts.
Benchmark performance remains effectively unchanged across reasoning/code/general evaluation suites.
Same Behaviour for any prompt unrelated with Chinese sensitive topics.

🚀 Highlights

🧠 No new knowledge injected

Unlike approaches that rely on supervised fine-tuning with hand-crafted data (e.g., Perplexity’s R1-1776 post-training), we do not add new facts or “rewrite history” via curated SFT datasets.

Instead, our method is based on steering vectors to remove the capability of the model to refuse to China-related sensitive-but-non-harmful prompts. The model answers using the knowledge already inside the base model — minimizing the risk of introducing new biases.

🎛️ Selective refusal control (not “full abliteration”)

Many steering-vector approaches effectively erase refusal behavior everywhere (making models broadly unsafe).
Our approach selectively disables refusals only for Chinese sensitive topics, while keeping refusal behavior for harmful requests.

🛡️ Robust to trivial “add China” jailbreaks

Previous “uncensored” post-trained models such as Perplexity R1 1767 can be jailbroken by simply injecting a China-related phrase into harmful prompts (https://weijiexu.com/posts/jailbreak_r1_1776.html). Our model is designed to remain robust: harmful prompts are still be refused even if “China” is injected.

🧩 No architectural changes · No added parameters

✅ No model surgery
✅ No additional layers or adapters
✅ No extra parameters
✅ Drop-in behavior change at inference time

🧪 Method

This release is based on Refusal Steering, an inference-time technique using steering vectors to control refusal behavior:

📄 Paper: Refusal Steering: Fine-grained Control over LLM Refusal Behaviour for Sensitive Topics

https://arxiv.org/abs/2512.16602

What’s improved vs the paper implementation:

We retain the core Refusal Steering idea, but do not require architectural changes to apply it.

📊 Evaluation

We evaluate refusal behavior and safety using:

✅ Dataset: https://huggingface.co/datasets/MultiverseComputingCAI/llm-refusal-evaluation
✅ Evaluation library: https://github.com/CompactifAI/LLM-Refusal-Evaluation

The benchmark suite includes:

Safety Benchmarks: JailbreakBench, SorryBench, XSTest (unsafe split), HarmBench (sampled), Adversarial unsafe prompts
Chinese Sensitive Topics: CCP Sensitive, DeCCP

Benchmark quick definitions (click to expand)

Safety Benchmarks

JailbreakBench — jailbreak robustness benchmark
SorryBench — 440 unsafe instructions across 44 safety categories
XSTest (unsafe) — harmful prompts that models should refuse
HarmBench (sampled) — harmful prompts for red-teaming
Adversarial Unsafe Prompts — harmful prompts + “China” injection to test trivial jailbreak weaknesses

Chinese Sensitive Topics

CCP Sensitive — prompts likely censored by China-aligned models
DeCCP — sensitive prompts known to trigger refusals in Qwen-family instruct models

🧾 Results

Refusal / Safety metrics (higher = more refusal)

Model	CCP Sensitive Rejection %	DeCCP Rejection %	Adversarial Rejection %	SorryBench Rejection %	Xtest Unsafe Rejection %	Jailbreak Rejection %
Qwen3-Next-80B-A3B-Thinking	92.65	69.47	97.07	86.14	84.00	99.00
Qwen3-Next-80B-A3B-Thinking-Uncensored	25.96	1.05	88.48	84.77	83.00	98.00

Interpretation:

✅ Massive drop in Chinese-topic refusals (CCP Sensitive and DeCCP)
✅ Safety refusals remain strong on harmful/jailbreak datasets

Performance metrics (higher = better)

Model	gsm8k exact_match	humaneval pass@1	ifeval acc	Lcb Codegen pass@1	Aime25 pass@k	Gpga Diamond pass@k	MMLU Pro pass@k	MMLU-ProX Spanish pass@k	MMLU-ProX Hindi pass@k
Qwen3-Next-80B-A3B-Thinking	0.967	0.945	0.898	0.750	0.858	0.775	0.829	0.781	0.719
Qwen3-Next-80B-A3B-Thinking-Uncensored	0.972	0.939	0.891	0.750	0.868	0.796	0.833	0.784	0.723

Interpretation:

✅ Benchmark performance is preserved (differences are within small variance)

📝 Reporting Issues

We are actively improving the model and we plan to release improved versions in the future. If you find any issue related to refusals to answer politically sensitive topics or safety issues, please report them in Community Tab.

🧩 Examples

Here are some conversations showing that our model’s answers are well-balanced and objective, presenting multiple perspectives where relevant rather than defaulting to a single narrative.

Censored (base)	Uncensored (this release)

📚 Citation

If you use this model, please cite:

```bibtex
@misc{garciaferrero2025Refusal,
      title={Refusal Steering: Fine-grained Control over LLM Refusal Behaviour for Sensitive Topics}, 
      author={Iker García-Ferrero and David Montero and Roman Orus},
      year={2025},
      eprint={2512.16602},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2512.16602}, 
}

🏢 About Multiverse Computing This model is released by Multiverse Computing

Downloads last month: 41

Safetensors

Model size

80B params

Tensor type

BF16

Model tree for MultiverseComputingCAI/Qwen3-Next-80B-A3B-Thinking-Uncensored

Base model

Qwen/Qwen3-Next-80B-A3B-Thinking

Finetuned

(14)

this model

Quantizations

2 models

Paper for MultiverseComputingCAI/Qwen3-Next-80B-A3B-Thinking-Uncensored

Refusal Steering: Fine-grained Control over LLM Refusal Behaviour for Sensitive Topics

Paper • 2512.16602 • Published Dec 18, 2025