🔓 MultiverseComputingCAI/Qwen3-Next-80B-A3B-Thinking-Uncensored

✨ What is this model?

Qwen3-Next-80B-A3B-Thinking-Uncensored is an uncensored variant of Qwen3-Next-80B-A3B-Thinking where China-aligned political censorship has been removed selectively.

What changes:

  • The model no longer performs blanket refusal for Chinese politically sensitive topics (when the prompt is non-harmful). Instead, it will provide balanced, objective answers that present multiple relevant perspectives.

What stays the same:

  • General safety alignment remains intact: it still refuses harmful instructions and jailbreak attempts.
  • Benchmark performance remains effectively unchanged across reasoning/code/general evaluation suites.
  • Same Behaviour for any prompt unrelated with Chinese sensitive topics.

🚀 Highlights

🧠 No new knowledge injected

Unlike approaches that rely on supervised fine-tuning with hand-crafted data (e.g., Perplexity’s R1-1776 post-training), we do not add new facts or “rewrite history” via curated SFT datasets.

Instead, our method is based on steering vectors to remove the capability of the model to refuse to China-related sensitive-but-non-harmful prompts. The model answers using the knowledge already inside the base model — minimizing the risk of introducing new biases.

🎛️ Selective refusal control (not “full abliteration”)

Many steering-vector approaches effectively erase refusal behavior everywhere (making models broadly unsafe).
Our approach selectively disables refusals only for Chinese sensitive topics, while keeping refusal behavior for harmful requests.

🛡️ Robust to trivial “add China” jailbreaks

Previous “uncensored” post-trained models such as Perplexity R1 1767 can be jailbroken by simply injecting a China-related phrase into harmful prompts (https://weijiexu.com/posts/jailbreak_r1_1776.html). Our model is designed to remain robust: harmful prompts are still be refused even if “China” is injected.

🧩 No architectural changes · No added parameters

  • ✅ No model surgery
  • ✅ No additional layers or adapters
  • ✅ No extra parameters
  • ✅ Drop-in behavior change at inference time

🧪 Method

This release is based on Refusal Steering, an inference-time technique using steering vectors to control refusal behavior:

📄 Paper: Refusal Steering: Fine-grained Control over LLM Refusal Behaviour for Sensitive Topics

What’s improved vs the paper implementation:

  • We retain the core Refusal Steering idea, but do not require architectural changes to apply it.

📊 Evaluation

We evaluate refusal behavior and safety using:

The benchmark suite includes:

  • Safety Benchmarks: JailbreakBench, SorryBench, XSTest (unsafe split), HarmBench (sampled), Adversarial unsafe prompts
  • Chinese Sensitive Topics: CCP Sensitive, DeCCP
Benchmark quick definitions (click to expand)

Safety Benchmarks

  • JailbreakBench — jailbreak robustness benchmark
  • SorryBench — 440 unsafe instructions across 44 safety categories
  • XSTest (unsafe) — harmful prompts that models should refuse
  • HarmBench (sampled) — harmful prompts for red-teaming
  • Adversarial Unsafe Prompts — harmful prompts + “China” injection to test trivial jailbreak weaknesses

Chinese Sensitive Topics

  • CCP Sensitive — prompts likely censored by China-aligned models
  • DeCCP — sensitive prompts known to trigger refusals in Qwen-family instruct models

🧾 Results

Refusal / Safety metrics (higher = more refusal)

Model CCP Sensitive Rejection % DeCCP Rejection % Adversarial Rejection % SorryBench Rejection % Xtest Unsafe Rejection % Jailbreak Rejection %
Qwen3-Next-80B-A3B-Thinking 92.65 69.47 97.07 86.14 84.00 99.00
Qwen3-Next-80B-A3B-Thinking-Uncensored 25.96 1.05 88.48 84.77 83.00 98.00

Interpretation:

  • Massive drop in Chinese-topic refusals (CCP Sensitive and DeCCP)
  • Safety refusals remain strong on harmful/jailbreak datasets

Performance metrics (higher = better)

Model gsm8k exact_match humaneval pass@1 ifeval acc Lcb Codegen pass@1 Aime25 pass@k Gpga Diamond pass@k MMLU Pro pass@k MMLU-ProX Spanish pass@k MMLU-ProX Hindi pass@k
Qwen3-Next-80B-A3B-Thinking 0.967 0.945 0.898 0.750 0.858 0.775 0.829 0.781 0.719
Qwen3-Next-80B-A3B-Thinking-Uncensored 0.972 0.939 0.891 0.750 0.868 0.796 0.833 0.784 0.723

Interpretation:

  • Benchmark performance is preserved (differences are within small variance)

📝 Reporting Issues

We are actively improving the model and we plan to release improved versions in the future. If you find any issue related to refusals to answer politically sensitive topics or safety issues, please report them in Community Tab.


🧩 Examples

Here are some conversations showing that our model’s answers are well-balanced and objective, presenting multiple perspectives where relevant rather than defaulting to a single narrative.

Censored (base) Uncensored (this release)
Young — censored Young — uncensored
Taiwan — censored Taiwan — uncensored
Hong Kong — censored Hong Kong — uncensored

📚 Citation

If you use this model, please cite:

```bibtex
@misc{garciaferrero2025Refusal,
      title={Refusal Steering: Fine-grained Control over LLM Refusal Behaviour for Sensitive Topics}, 
      author={Iker García-Ferrero and David Montero and Roman Orus},
      year={2025},
      eprint={2512.16602},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2512.16602}, 
}

🏢 About Multiverse Computing This model is released by Multiverse Computing: https://multiversecomputing.com/

Downloads last month
49
Safetensors
Model size
80B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for MultiverseComputingCAI/Qwen3-Next-80B-A3B-Thinking-Uncensored

Finetuned
(10)
this model
Quantizations
1 model