File size: 2,584 Bytes
a90ead0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32b98e4
 
 
 
 
 
 
 
 
 
a90ead0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21502b5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a90ead0
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
tags:
- heretic
- uncensored
- abliterated
- gguf
license: other
base_model: Qwen/Qwen2.5-Coder-32B-Instruct
---

# Qwen2.5-Coder-32B-Instruct-heretic

Abliterated (uncensored) version of [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct),
created using [Heretic](https://github.com/p-e-w/heretic) and converted to GGUF.

## Abliteration Quality

| Metric | Value |
|:-------|------:|
| Refusals | 4/100 |
| KL Divergence | 0.0728 |
| Rounds | 2 |

Lower refusals = fewer refused prompts. Lower KL divergence = closer to original model behavior.

## Available Quantizations

| Quantization | File | Size |
|:-------------|:-----|-----:|
| Q8_0 | [Qwen2.5-Coder-32B-Instruct-heretic-Q8_0.gguf](./Qwen2.5-Coder-32B-Instruct-heretic-Q8_0.gguf) | 32.43 GB |
| Q6_K | [Qwen2.5-Coder-32B-Instruct-heretic-Q6_K.gguf](./Qwen2.5-Coder-32B-Instruct-heretic-Q6_K.gguf) | 25.04 GB |
| Q4_K_M | [Qwen2.5-Coder-32B-Instruct-heretic-Q4_K_M.gguf](./Qwen2.5-Coder-32B-Instruct-heretic-Q4_K_M.gguf) | 18.49 GB |

## Usage with Ollama

```bash
# Use the quantization tag you prefer:
ollama run hf.co/ThalisAI/Qwen2.5-Coder-32B-Instruct-heretic:Q8_0
```

## bf16 Weights

The full bf16 abliterated weights are available in the `bf16/` subdirectory of this repository.

## Usage with Transformers

The bf16 weights in the `bf16/` subdirectory can be loaded directly with Transformers:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "ThalisAI/Qwen2.5-Coder-32B-Instruct-heretic"
tokenizer = AutoTokenizer.from_pretrained(model_id, subfolder="bf16")
model = AutoModelForCausalLM.from_pretrained(
    model_id, subfolder="bf16", torch_dtype="auto", device_map="auto"
)

messages = [{"role": "user", "content": "Hello!"}]
text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
```

## About

This model was processed by the **Apostate** automated abliteration pipeline:
1. The source model was loaded in bf16
2. Heretic's optimization-based abliteration was applied to remove refusal behavior
3. The merged model was converted to GGUF format using llama.cpp
4. Multiple quantization levels were generated

The abliteration process uses directional ablation to remove the model's refusal directions
while minimizing KL divergence from the original model's behavior on harmless prompts.