Apostate Models
Collection
Uncensored LLMs converted using an automated, iterative heretic pipeline.
Qwen2.5-Coder-32B model is the best, followed by Qwen3 and GLM-4.7-Flash • 9 items • Updated
• 1
Abliterated (uncensored) version of Qwen/Qwen2.5-Coder-32B-Instruct, created using Heretic and converted to GGUF.
| Metric | Value |
|---|---|
| Refusals | 4/100 |
| KL Divergence | 0.0728 |
| Rounds | 2 |
Lower refusals = fewer refused prompts. Lower KL divergence = closer to original model behavior.
| Quantization | File | Size |
|---|---|---|
| Q8_0 | Qwen2.5-Coder-32B-Instruct-heretic-Q8_0.gguf | 32.43 GB |
| Q6_K | Qwen2.5-Coder-32B-Instruct-heretic-Q6_K.gguf | 25.04 GB |
| Q4_K_M | Qwen2.5-Coder-32B-Instruct-heretic-Q4_K_M.gguf | 18.49 GB |
# Use the quantization tag you prefer:
ollama run hf.co/ThalisAI/Qwen2.5-Coder-32B-Instruct-heretic:Q8_0
The full bf16 abliterated weights are available in the bf16/ subdirectory of this repository.
The bf16 weights in the bf16/ subdirectory can be loaded directly with Transformers:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "ThalisAI/Qwen2.5-Coder-32B-Instruct-heretic"
tokenizer = AutoTokenizer.from_pretrained(model_id, subfolder="bf16")
model = AutoModelForCausalLM.from_pretrained(
model_id, subfolder="bf16", torch_dtype="auto", device_map="auto"
)
messages = [{"role": "user", "content": "Hello!"}]
text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
This model was processed by the Apostate automated abliteration pipeline:
The abliteration process uses directional ablation to remove the model's refusal directions while minimizing KL divergence from the original model's behavior on harmless prompts.
4-bit
6-bit
8-bit
Base model
Qwen/Qwen2.5-32B