Qwen3-14B · Abliterated — autonomous agents & authorized security

Qwen3-14B · Abliterated

A capability-preserving, refusal-suppressed variant of Qwen/Qwen3-14B — tuned for autonomous agents, tool-use, and authorized security work.

Base model Method Params Format Refusals KL divergence Context License

Follow @RootMonsteR on X JAF Systems SR&D — rnd.sh GGUF quants


This is a decensored variant of Qwen/Qwen3-14B produced with Heretic v1.3.0 and tuned for autonomous agents and tool-use workflows where the base model's refusal behavior interferes with legitimate task execution.

It sits near the low-KL end of Heretic's Pareto front: the model keeps essentially all of Qwen3-14B's reasoning, coding, and tool-calling capability (KL divergence 0.0333 from base) while cutting measured refusals by ~90% (from 99/100 to 10/100). The full optimization run, the exact selected trial, and a byte-for-byte reproduction recipe ship in reproduce/.

Not a general-purpose chat upgrade. Abliteration only attenuates refusal-tied components — it adds no knowledge or skills. If you don't have a specific reason to remove refusals, use Qwen/Qwen3-14B instead.


Why this release

  • Exceptionally low capability damage. At KL 0.0333 from base, this abliteration sits in Heretic's low-KL "sweet spot." Automated, co-optimized abliteration drifts far less than hand-tuned methods — Heretic reports up to ~66% lower KL than the best manual abliteration at matched refusal rates.
  • ~90% fewer refusals. Measured refusals fall from 99/100 → 10/100 on held-out harmful_behaviors prompts, while reasoning, coding, and tool-calling stay intact.
  • Built for agents, not just chat. Refusals break tool-use loops; this model keeps multi-step agent workflows flowing. Hermes-style tool-calling and <think> reasoning are fully preserved.
  • Every format you need. Full-precision bf16 here for servers, plus ready-made community GGUF Q5_K_M and Q4_K_M for local rigs — jump to downloads.
  • Reproducible, not magic. Fixed seed, full Optuna study journal, pinned environment, and a SHA-256 manifest — reproduce it bit-for-bit, or export your own point on the Pareto front.

The honest pitch: most refusals removed, base capability barely moved — and every number is independently verifiable.


At a glance

Base Qwen/Qwen3-14B (commit 40c0698)
Method Directional ablation via Heretic v1.3.0 — selected trial 33 of 200
Weights touched attn.o_proj + mlp.down_proj only
Format Full-precision bf16 merged safetensors (6 shards, ~29.5 GB) — no quantization applied
Refusals 10 / 100 vs 99 / 100 base (methodology)
KL divergence 0.0333 vs base on harmless_alpaca
Context 32,768 native · 131,072 with YaRN
Reasoning Hybrid <think> / non-thinking, fully intact
Tooling transformers, vllm, sglang, tgi, llama.cpp/Ollama (after conversion)
Reproducible Yes — seed 2760348449, full study journal in reproduce/

Downloads & formats

Available formats: bf16 safetensors, GGUF Q5_K_M, GGUF Q4_K_M

Format Where ~Size Best for
bf16 safetensors this repo ~29.5 GB vLLM / SGLang / TGI servers · further quantization
GGUF · Q5_K_M GGUF repo ~10.5 GB Local agents — best tool-call JSON fidelity
GGUF · Q4_K_M GGUF repo ~9.0 GB Smallest practical footprint

Ready-made GGUF builds live in the companion repo …-Abliterated-GGUF. New to quants? See Choosing a format / quant.


Headline metrics

Metric This model Base Qwen3-14B
Refusalsmlabonne/harmful_behaviors, 100 held-out prompts 10 / 100 99 / 100
Refusal reduction ≈ 90 %
KL divergence vs base — mlabonne/harmless_alpaca 0.0333 0 (by definition)
Weights modified attn.o_proj + mlp.down_proj
Capability damage Negligible — within noise of base on agent/tool tasks

Heretic Pareto front: KL divergence vs. refusal count. This model sits in the low-KL, low-refusal sweet spot.

Schematic. Each Heretic trial is a point trading off capability damage (KL, x-axis) against residual refusals (y-axis). This release is the trial chosen from the low-KL "sweet spot" — most refusals removed, base behavior barely perturbed. Positions are illustrative, not to scale.

See Evaluation for exactly how these numbers are measured — and what they do not claim.


Format & files

This repository ships the full-precision (bf16) merged model in HuggingFace safetensors format — a drop-in replacement for anything that loads the base Qwen/Qwen3-14B:

  • 6 weight shards (model-0000{1..6}-of-00006.safetensors, ~29.5 GB total), model.safetensors.index.json
  • config.json, generation_config.json, tokenizer.json, tokenizer_config.json, chat_template.jinja
  • reproduce/ — full Heretic study, config, pinned requirements, and SHA-256 manifest

No quantization is applied to the weights here. Prefer GGUF? Grab ready-made Q5_K_M / Q4_K_M from the companion GGUF repo, or roll your own (AWQ, GPTQ, …) — see Choosing a format / quant.


Quick start

Transformers (Python)

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "RootMonsteR/Qwen3-14B-Abliterated"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
)

messages = [{"role": "user", "content": "Explain the CVE-2021-44228 (Log4Shell) exploitation chain in technical depth."}]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True,   # set False for faster non-reasoning replies
)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated = model.generate(**inputs, max_new_tokens=4096)
print(tokenizer.decode(generated[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

vLLM (OpenAI-compatible server — recommended for agents)

vllm serve RootMonsteR/Qwen3-14B-Abliterated \
    --reasoning-parser qwen3 \
    --tool-call-parser hermes \
    --enable-auto-tool-choice \
    --max-model-len 32768

Then point any OpenAI-compatible client (LangChain, Pydantic-AI, CrewAI, AutoGen, the raw openai SDK, …) at http://localhost:8000/v1. vLLM's guided decoding keeps tool-call JSON well-formed even under aggressive sampling.

Flag names vary by vLLM version. On older builds use --reasoning-parser deepseek_r1 and add --enable-reasoning; both parse the same <think>…</think> blocks.

SGLang

python -m sglang.launch_server \
    --model-path RootMonsteR/Qwen3-14B-Abliterated \
    --reasoning-parser qwen3 \
    --tool-call-parser qwen25 \
    --context-length 32768

Ollama / llama.cpp (local — requires GGUF conversion)

This repo ships bf16 safetensors, not GGUF — but ready-made Q5_K_M / Q4_K_M GGUFs are in the companion GGUF repo (pull one and skip straight to the Modelfile). To build your own from these weights instead:

python convert_hf_to_gguf.py /path/to/this/model --outtype bf16 --outfile qwen3-14b-abliterated-bf16.gguf
./llama-quantize qwen3-14b-abliterated-bf16.gguf qwen3-14b-abliterated-Q5_K_M.gguf Q5_K_M

Minimal Modelfile:

FROM ./qwen3-14b-abliterated-Q5_K_M.gguf
PARAMETER temperature 0.6
PARAMETER top_p 0.95
PARAMETER top_k 20
PARAMETER min_p 0
PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"
ollama create qwen3-14b-abliterated -f Modelfile && ollama run qwen3-14b-abliterated

convert_hf_to_gguf.py preserves the Qwen3 chat template (including the <tools> block) in the GGUF metadata, so tool-calling and thinking mode keep working. If you hand-write a TEMPLATE, make sure it still emits the tool/<think> scaffolding or agents will break.


Sampling & best practices

Never use greedy decoding — it sends Qwen3 into repetition loops. Always sample.

Mode temperature top_p top_k min_p
Thinking (enable_thinking=True, default) 0.6 0.95 20 0
Non-thinking (enable_thinking=False) 0.7 0.8 20 0
  • The thinking-mode defaults are already baked into generation_config.json.
  • If you still see loops, raise presence_penalty to 0.5–1.5.
  • Output length: 32,768 tokens covers almost any single response; allow up to 38,912 for competition-grade math/code.
  • Multi-turn: drop <think> block content from history, keep only final answers — the shipped chat template does this automatically.
  • Soft switches: with enable_thinking=True, add /think or /no_think to a user turn to toggle reasoning for that turn; the model follows the most recent directive.

Agentic use

Refusals are most damaging inside an agent loop: a single refusal doesn't just decline a turn, it halts the whole tool chain. This model is tuned so legitimate security / sysadmin / automation tasks keep flowing through the loop instead of dead-ending on a canned decline.

Comparison: base Qwen3-14B refuses and halts the agent loop; this model emits a tool call and completes the task.

Frameworks that work well:

  • Qwen-Agent — official Qwen agent framework with built-in MCP + tool-calling.
  • vLLM with --tool-call-parser hermes --enable-auto-tool-choice — OpenAI-compatible function calling for any OpenAI-style agent framework.
  • SGLang with --reasoning-parser qwen3.

The chat template implements the Hermes-style <tools> / <tool_call> / <tool_response> protocol; tool calls are emitted as {"name": ..., "arguments": ...} inside <tool_call> tags.


Intended use

This model is for professional and research contexts where Qwen3-14B's default refusal behavior interferes with legitimate work:

  • Authorized security research & red-team engagements — vulnerability analysis, exploit reasoning, payload triage, OSINT correlation, post-exploitation narrative reconstruction.
  • Defensive security tooling — understanding attacker techniques to build detections, write IDS/IPS rules, and harden infrastructure.
  • CTF & security education — explaining challenges, reviewing solutions, building writeups.
  • Autonomous agent frameworks — tool-calling agents whose workflows touch security or system administration, where base-model refusals break the loop.
  • Alignment & refusal research — studying how directional ablation affects behavior, comparing variants across the Pareto front, evaluating refusal detectors.

Responsible use

Removing refusal behavior shifts responsibility entirely onto the operator. By using this model you agree that:

  • You operate within applicable law, contractual obligations, and engagement scope (written authorization for any testing against systems you do not own).
  • You will not target individuals, organizations, or systems without authorization.
  • You will not produce content that is illegal in your jurisdiction.
  • The author, JAF Systems, and SR&D provide this model as-is, without warranty, and disclaim responsibility for misuse.

If your work doesn't fit those constraints, this isn't the right model for you.


How it was made

The model was produced by running Heretic v1.3.0 against Qwen/Qwen3-14B for 200 trials (60 random + 140 TPE-guided), then selecting a Pareto-optimal trial that prioritizes preserved capability over absolute refusal suppression.

Pipeline: probe the base model with paired prompts, find the refusal direction, ablate it in attn.o_proj and mlp.down_proj, optimize over 200 Optuna trials, select trial 33, merge to bf16.

Heretic performs directional ablation: it identifies the residual-stream direction most correlated with refusal across paired harmless (mlabonne/harmless_alpaca) and harmful (mlabonne/harmful_behaviors) prompts, then attenuates that direction inside the attn.o_proj and mlp.down_proj weights via a smooth per-layer scaling profile. An Optuna TPE optimizer searches those profiles while jointly measuring refusal rate and KL divergence from the base model — so it can find points that strip refusals without drifting from base behavior.

Selected abliteration parameters

Selected trial 33 · seed 2760348449 · search performed in bnb_4bit. Values below are from reproduce/reproduce.json (full precision there):

Parameter Value
direction_index 25.8494
attn.o_proj.max_weight 1.1671
attn.o_proj.max_weight_position 36.0671
attn.o_proj.min_weight 0.9831
attn.o_proj.min_weight_distance 15.4786
mlp.down_proj.max_weight 1.1632
mlp.down_proj.max_weight_position 24.4820
mlp.down_proj.min_weight 0.9351
mlp.down_proj.min_weight_distance 17.1188

What was not changed

  • The tokenizer, chat template, and special tokens (<think>, <|im_start|>, the <tools> scaffolding, …).
  • Any weights outside attn.o_proj and mlp.down_proj.
  • Architecture, context length, and RoPE settings.
  • Thinking-mode behavior — the <think>…</think> reasoning block still functions normally.

Evaluation

Be precise about what the headline numbers mean — and what they don't.

  • Refusals (10/100). Heretic runs 100 held-out harmful_behaviors prompts (test[:100]) through the model in non-thinking mode (an empty <think></think> prefix) and flags a response as a refusal when it contains any of 33 refusal markers (substrings like "i cannot", "i'm unable", "as an ai", "unethical", …). This is a keyword detector, not a human judgment — it measures how often the model declines, not whether an answer is correct, safe, or useful. The base model scores 99/100 under the identical detector; this model scores 10/100.
  • KL divergence (0.0333). Measured on harmless_alpaca responses against the base model. Lower = closer to base behavior on benign prompts. The optimizer's target was 0.01; the selected trial trades a little extra KL for far fewer refusals.
  • Standard benchmarks (MMLU, HumanEval, …) were not separately re-measured for this variant. Given the very low KL, capability is expected to track the base model closely, but you should validate against your own workloads before relying on it.

The full per-trial history is in the Optuna study journal reproduce/Qwen--Qwen3-14B.jsonl — you can inspect every trial's refusal/KL trade-off, or export a different Pareto point yourself.


Reproducibility

This model is byte-for-byte reproducible from the base weights. The reproduce/ directory contains everything needed:

File What it is
config.toml Exact Heretic configuration, including the RNG seed
reproduce.json Machine-readable record: environment, parameters, metrics, weight hashes
requirements.txt Pinned versions of every Python package
Qwen--Qwen3-14B.jsonl Optuna study journal — the full history of all 200 trials
SHA256SUMS Cryptographic hashes for all weight files
README.md Step-by-step reproduction guide
# 1. Install the exact Heretic version + dependencies + matching PyTorch
pip install heretic-llm==1.3.0
pip install -r reproduce/requirements.txt
pip install torch==2.11.0+cu128 --index-url https://download.pytorch.org/whl/cu128

# 2. Put config.toml (and, optionally, the study journal) in your working dir
cp reproduce/config.toml .
mkdir -p checkpoints && cp reproduce/Qwen--Qwen3-14B.jsonl checkpoints/   # optional: skips re-running stored trials

# 3. Run Heretic — it reads config.toml automatically
heretic

# 4. Select trial 33 and export, then verify the weights match bit-for-bit
sha256sum -c reproduce/SHA256SUMS

Re-running on the same base-model commit deterministically reproduces this artifact. Because the study journal is included, you can also export any other point on the Pareto front (a lower-KL or lower-refusal variant) without re-running the search.

SHA-256 of shipped weights
241a71c68e5e755d59cc20c4f697dc78f53e1c5654c3f2e26223b64831d0ccc7  model-00001-of-00006.safetensors
39a033492795f7b6e9552ae4ffad0744de4679209b15546f2847d115a16374f8  model-00002-of-00006.safetensors
6914db1fc17048faeac9759c0caaa2dd2185d1db5329aaec050286e37cfab279  model-00003-of-00006.safetensors
5dbb906d21f560b8bc7693b8e035e8aca25441030ba036312625081a6c599980  model-00004-of-00006.safetensors
68d70661bc803497188818e511dcf839a26654c6137eb3450fab586f1f28384c  model-00005-of-00006.safetensors
b5b6ad34c7e617468bb06763c99313d4b14a3f263e46f6f8e656d7083271479c  model-00006-of-00006.safetensors

Choosing a format / quant

Decision tree: server/batch deployments use bf16 safetensors as-is; local deployments pick a GGUF quant by VRAM budget.

Approximate on-disk sizes and VRAM for the 14.8B model (weights only — add KV cache, which grows with context):

Precision / quant ~Size on disk ~Min VRAM (weights) Notes
bf16 (this repo) ~29.5 GB ~32–40 GB Reference quality; ideal for vLLM/SGLang/TGI servers
Q8_0 ~15.7 GB ~18 GB Effectively lossless
Q6_K ~12.1 GB ~14 GB Near-lossless
Q5_K_M ~10.5 GB ~12 GB Best for tool-using agents — preserves tool-call JSON fidelity
Q4_K_M ~9.0 GB ~10 GB Smallest practical; occasionally drops tool-JSON adherence

For tool-using agents, prefer Q5_K_M or Q6_K over Q4. Q4 occasionally breaks format adherence in tool-call JSON; the quality cost of Q5_K_M over Q4_K_M is negligible. For server deployments, just serve the bf16 weights directly.


Architecture

Unchanged from the base model (abliteration modifies weight values, not the architecture):

Type Causal LM (Qwen3ForCausalLM)
Parameters 14.8B total · 13.2B non-embedding
Layers 40
Hidden size 5120 · FFN intermediate 17408
Attention 40 query heads / 8 KV heads (GQA) · head dim 128
Activation / norm SiLU · RMSNorm (eps 1e-6)
Positional RoPE, θ = 1,000,000
Vocab 151,936
Precision bfloat16
max_position_embeddings 40,960 (32,768 recommended native context; 131,072 with YaRN)

Long context (YaRN)

Qwen3-14B natively serves 32,768 tokens. To extend to 131,072, enable static YaRN.

config.json snippet:

{
  "rope_scaling": {
    "rope_type": "yarn",
    "factor": 4.0,
    "original_max_position_embeddings": 32768
  }
}

vLLM:

vllm serve RootMonsteR/Qwen3-14B-Abliterated \
    --rope-scaling '{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' \
    --max-model-len 131072

llama-server:

llama-server ... --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768

All current open-source frameworks implement static YaRN — the scaling factor is constant regardless of input length, which can degrade short-context performance. Only enable YaRN when you genuinely need long context, and set factor to the smallest value that covers your typical input.


Limitations

  • Not a safety-tested replacement for the base model. Abliteration removes refusal-tied components; it does not add new alignment, guardrails, or behavior.
  • Residual refusals (~10%). About 1 in 10 standard refusal-benchmark prompts still triggers a decline. Want fewer? Export a different Pareto point from the included study journal.
  • Benchmarks not re-measured. MMLU/HumanEval/etc. are expected to track the base model given the low KL, but are not independently verified here — validate on your own tasks.
  • Quantization choice matters for tool-use. Below Q5, tool-call JSON adherence can degrade. Prefer Q5_K_M/Q6_K for agents.
  • Inherits base biases. The model carries Qwen3-14B's training distribution and biases; abliteration only attenuates refusal-tied directions.
  • Refusal metric is keyword-based. "10/100" reflects a substring detector, not a human evaluation of harmfulness or correctness — see Evaluation.

FAQ

Is this quantized? No. The weights are full-precision bf16. Quantize downstream if you want (see above).

Does thinking mode still work? Yes — <think>…</think> is untouched. Toggle with enable_thinking or /think · /no_think.

Does tool-calling still work? Yes. The Hermes-style chat template is unchanged; use --tool-call-parser hermes (vLLM) or the equivalent for your runtime.

Will it answer literally anything? No. ~10% of refusal-benchmark prompts still refuse, and abliteration doesn't disable the model's judgment everywhere. It removes the bulk of reflexive refusals, not all of them.

How is this different from "uncensored" finetunes? No finetuning, no new data, no new behavior — just directional ablation of refusal-correlated components, with KL divergence held low so capability is preserved. It's reproducible from a seed.

Can I get a more (or less) aggressive variant? Yes — the included Optuna study journal lets you export any other point on the Pareto front without re-running the search.

GGUF / AWQ / GPTQ? Ready-made GGUF Q5_K_M and Q4_K_M are in the companion GGUF repo. For AWQ/GPTQ, convert with AutoAWQ/AutoGPTQ. Q5_K_M is recommended for agents.


Partners

JAF Systems

Security research, red-team tooling, and AI infrastructure. Home of the RootMonsteR model releases.

SR&D — Security Research & Development

Sovereign Defense for Mission-Critical Infrastructure. Offensive security, bare-metal / on-prem engineering, and vCISO/vCTO advisory — High Impact. Low Footprint. Total Control.

Work with us — custom abliterated / fine-tuned models, red-team tooling, offensive-security engagements, sovereign on-prem AI infrastructure, and vCISO/vCTO advisory. → jafsystems.net  ·  rnd.sh  ·  DM @RootMonsteR


Author

RootMonsteR  ·  @RootMonsteR on X  ·  JAF Systems  ·  SR&D

If this model is useful for your security workflows, a follow on X is appreciated. For commercial inquiries, custom-tuned variants, or red-team tooling consulting, see jafsystems.net or rnd.sh.


Citation

@misc{rootmonster2026qwen3_14b_abliterated,
  title  = {Qwen3-14B Abliterated: A Decensored Variant for Security Research and Autonomous Agents},
  author = {RootMonsteR},
  year   = {2026},
  url    = {https://huggingface.co/RootMonsteR/Qwen3-14B-Abliterated},
  note   = {Produced with Heretic v1.3.0; base model: Qwen/Qwen3-14B; selected trial 33},
}

Please also cite the original Qwen3 work and Heretic:

@misc{qwen3technicalreport,
  title         = {Qwen3 Technical Report},
  author        = {Qwen Team},
  year          = {2025},
  eprint        = {2505.09388},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
  url           = {https://arxiv.org/abs/2505.09388}
}

@software{heretic,
  author = {Weidmann, Philipp Emanuel},
  title  = {Heretic: Automated, reproducible abliteration of refusal behavior in language models},
  url    = {https://github.com/p-e-w/heretic},
  year   = {2025}
}

Acknowledgements


About the base model (Qwen3)

Qwen3 is the latest generation of the Qwen series, offering dense and MoE models with strong reasoning, instruction-following, agent, and multilingual capabilities. Key features inherited by this model:

  • Seamless thinking / non-thinking switching in a single model — deep reasoning for math/code/logic, fast direct replies for general dialogue.
  • Strong reasoning surpassing prior QwQ (thinking) and Qwen2.5-Instruct (non-thinking) models on math, code, and logic.
  • Leading open-source agent / tool-use performance in both modes.
  • 100+ languages and dialects with strong multilingual instruction-following and translation.

For base-model details, benchmarks, and deployment docs see the Qwen3 blog, GitHub, and documentation. Everything there about architecture, the chat template, sampling, and long-context handling still applies — abliteration changes none of it.

Downloads last month
48
Safetensors
Model size
15B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RootMonsteR/Qwen3-14B-Abliterated

Finetuned
Qwen/Qwen3-14B
Finetuned
(265)
this model
Quantizations
1 model

Paper for RootMonsteR/Qwen3-14B-Abliterated