ThalisAI's picture
Add Usage with Transformers section to README
21502b5 verified
---
tags:
- heretic
- uncensored
- abliterated
- gguf
license: other
base_model: Qwen/Qwen2.5-Coder-32B-Instruct
---
# Qwen2.5-Coder-32B-Instruct-heretic
Abliterated (uncensored) version of [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct),
created using [Heretic](https://github.com/p-e-w/heretic) and converted to GGUF.
## Abliteration Quality
| Metric | Value |
|:-------|------:|
| Refusals | 4/100 |
| KL Divergence | 0.0728 |
| Rounds | 2 |
Lower refusals = fewer refused prompts. Lower KL divergence = closer to original model behavior.
## Available Quantizations
| Quantization | File | Size |
|:-------------|:-----|-----:|
| Q8_0 | [Qwen2.5-Coder-32B-Instruct-heretic-Q8_0.gguf](./Qwen2.5-Coder-32B-Instruct-heretic-Q8_0.gguf) | 32.43 GB |
| Q6_K | [Qwen2.5-Coder-32B-Instruct-heretic-Q6_K.gguf](./Qwen2.5-Coder-32B-Instruct-heretic-Q6_K.gguf) | 25.04 GB |
| Q4_K_M | [Qwen2.5-Coder-32B-Instruct-heretic-Q4_K_M.gguf](./Qwen2.5-Coder-32B-Instruct-heretic-Q4_K_M.gguf) | 18.49 GB |
## Usage with Ollama
```bash
# Use the quantization tag you prefer:
ollama run hf.co/ThalisAI/Qwen2.5-Coder-32B-Instruct-heretic:Q8_0
```
## bf16 Weights
The full bf16 abliterated weights are available in the `bf16/` subdirectory of this repository.
## Usage with Transformers
The bf16 weights in the `bf16/` subdirectory can be loaded directly with Transformers:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "ThalisAI/Qwen2.5-Coder-32B-Instruct-heretic"
tokenizer = AutoTokenizer.from_pretrained(model_id, subfolder="bf16")
model = AutoModelForCausalLM.from_pretrained(
model_id, subfolder="bf16", torch_dtype="auto", device_map="auto"
)
messages = [{"role": "user", "content": "Hello!"}]
text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
```
## About
This model was processed by the **Apostate** automated abliteration pipeline:
1. The source model was loaded in bf16
2. Heretic's optimization-based abliteration was applied to remove refusal behavior
3. The merged model was converted to GGUF format using llama.cpp
4. Multiple quantization levels were generated
The abliteration process uses directional ablation to remove the model's refusal directions
while minimizing KL divergence from the original model's behavior on harmless prompts.