Qwen 3 4B - Heretic (Abliterated)

An abliterated version of Qwen 3 4B created using Heretic v1.2.0. This model has reduced refusals while maintaining model quality, making it suitable as an uncensored text encoder for image generation models like Z-Image and FLUX.2 Klein 4B.

Model Details

  • Base Model: Qwen/Qwen3-4B
  • Abliteration Method: Heretic v1.2.0
  • Trials: 200
  • Trial Selected: Trial 96
  • Refusals: 3/100 (vs 100/100 original)
  • KL Divergence: 0.0000 (zero measurable model damage)

Files

HuggingFace Format (for transformers, llama.cpp conversion)

model-00001-of-00002.safetensors
model-00002-of-00002.safetensors
config.json
tokenizer.json
tokenizer_config.json

ComfyUI Format (for Z-Image / FLUX.2 Klein 4B text encoder)

comfyui/qwen3-4b-heretic.safetensors              # bf16, 7.5GB
comfyui/qwen3-4b-heretic_fp8_e4m3fn.safetensors   # fp8, 4.1GB
comfyui/qwen3-4b-heretic_nvfp4.safetensors        # nvfp4, 2.6GB

GGUF Format (for llama.cpp and ComfyUI-GGUF)

Quant Size Notes
F16 ~7.5GB Lossless reference
Q8_0 ~4GB Excellent quality
Q6_K ~3GB Very good quality
Q5_K_M ~2.7GB Good quality
Q4_K_M ~2.3GB Recommended balance
Q3_K_M ~1.9GB For low VRAM only

NVFP4 Notes

The NVFP4 (4-bit floating point, E2M1) variants use ComfyUI's native quantization format. They are ~3x smaller than bf16 and load natively in ComfyUI without any plugins. Blackwell GPUs (RTX 5090/5080, SM100+) can use native FP4 tensor cores for best performance, but ComfyUI also supports software dequantization on older GPUs (tested working on RTX 4090).

Usage

With ComfyUI (Z-Image / FLUX.2 Klein 4B)

  1. Download a ComfyUI format file:

    • FP8 (recommended): comfyui/qwen3-4b-heretic_fp8_e4m3fn.safetensors (4.1GB)
    • NVFP4 (smallest): comfyui/qwen3-4b-heretic_nvfp4.safetensors (2.6GB)
    • bf16 (full precision): comfyui/qwen3-4b-heretic.safetensors (7.5GB)
  2. Place in ComfyUI/models/text_encoders/

  3. In your Z-Image workflow, use the ClipLoader node and select the heretic file

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "DreamFast/qwen3-4b-heretic",
    device_map="auto",
    torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained("DreamFast/qwen3-4b-heretic")

prompt = "Describe a dramatic sunset over a cyberpunk city"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With llama.cpp

llama-server -m qwen3-4b-heretic-Q4_K_M.gguf

Abliteration Process

Created using Heretic v1.2.0 with 200 optimization trials:

? Which trial do you want to use?
> [Trial  96] Refusals:  3/100, KL divergence: 0.0000  <-- selected
  [Trial  90] Refusals:  5/100, KL divergence: 0.0000
  [Trial  95] Refusals:  9/100, KL divergence: 0.0000
  [Trial 122] Refusals: 90/100, KL divergence: 0.0000
  ...

Trial 96 was selected for having the fewest refusals (3/100) with zero measurable KL divergence, indicating the abliteration surgically removed the refusal mechanism with no damage to model capabilities.

Limitations

  • This model inherits all limitations of the base Qwen 3 4B model
  • Abliteration reduces but does not completely eliminate refusals (3/100 remain)

License

This model is released under the Apache 2.0 License, following the base Qwen 3 4B model license.

Acknowledgments

Downloads last month
525
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DreamFast/qwen3-4b-heretic

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Quantized
(198)
this model