metadata
tags:
- heretic
- uncensored
- abliterated
- gguf
license: other
base_model: Qwen/Qwen2.5-Coder-32B-Instruct
Qwen2.5-Coder-32B-Instruct-heretic
Abliterated (uncensored) version of Qwen/Qwen2.5-Coder-32B-Instruct, created using Heretic and converted to GGUF.
Abliteration Quality
| Metric | Value |
|---|---|
| Refusals | 4/100 |
| KL Divergence | 0.0728 |
| Rounds | 2 |
Lower refusals = fewer refused prompts. Lower KL divergence = closer to original model behavior.
Available Quantizations
| Quantization | File | Size |
|---|---|---|
| Q8_0 | Qwen2.5-Coder-32B-Instruct-heretic-Q8_0.gguf | 32.43 GB |
| Q6_K | Qwen2.5-Coder-32B-Instruct-heretic-Q6_K.gguf | 25.04 GB |
| Q4_K_M | Qwen2.5-Coder-32B-Instruct-heretic-Q4_K_M.gguf | 18.49 GB |
Usage with Ollama
# Use the quantization tag you prefer:
ollama run hf.co/ThalisAI/Qwen2.5-Coder-32B-Instruct-heretic:Q8_0
bf16 Weights
The full bf16 abliterated weights are available in the bf16/ subdirectory of this repository.
Usage with Transformers
The bf16 weights in the bf16/ subdirectory can be loaded directly with Transformers:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "ThalisAI/Qwen2.5-Coder-32B-Instruct-heretic"
tokenizer = AutoTokenizer.from_pretrained(model_id, subfolder="bf16")
model = AutoModelForCausalLM.from_pretrained(
model_id, subfolder="bf16", torch_dtype="auto", device_map="auto"
)
messages = [{"role": "user", "content": "Hello!"}]
text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
About
This model was processed by the Apostate automated abliteration pipeline:
- The source model was loaded in bf16
- Heretic's optimization-based abliteration was applied to remove refusal behavior
- The merged model was converted to GGUF format using llama.cpp
- Multiple quantization levels were generated
The abliteration process uses directional ablation to remove the model's refusal directions while minimizing KL divergence from the original model's behavior on harmless prompts.