| | --- |
| | tags: |
| | - heretic |
| | - uncensored |
| | - abliterated |
| | - gguf |
| | license: other |
| | base_model: Qwen/Qwen2.5-Coder-32B-Instruct |
| | --- |
| | |
| | # Qwen2.5-Coder-32B-Instruct-heretic |
| |
|
| | Abliterated (uncensored) version of [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct), |
| | created using [Heretic](https://github.com/p-e-w/heretic) and converted to GGUF. |
| |
|
| | ## Abliteration Quality |
| |
|
| | | Metric | Value | |
| | |:-------|------:| |
| | | Refusals | 4/100 | |
| | | KL Divergence | 0.0728 | |
| | | Rounds | 2 | |
| |
|
| | Lower refusals = fewer refused prompts. Lower KL divergence = closer to original model behavior. |
| |
|
| | ## Available Quantizations |
| |
|
| | | Quantization | File | Size | |
| | |:-------------|:-----|-----:| |
| | | Q8_0 | [Qwen2.5-Coder-32B-Instruct-heretic-Q8_0.gguf](./Qwen2.5-Coder-32B-Instruct-heretic-Q8_0.gguf) | 32.43 GB | |
| | | Q6_K | [Qwen2.5-Coder-32B-Instruct-heretic-Q6_K.gguf](./Qwen2.5-Coder-32B-Instruct-heretic-Q6_K.gguf) | 25.04 GB | |
| | | Q4_K_M | [Qwen2.5-Coder-32B-Instruct-heretic-Q4_K_M.gguf](./Qwen2.5-Coder-32B-Instruct-heretic-Q4_K_M.gguf) | 18.49 GB | |
| |
|
| | ## Usage with Ollama |
| |
|
| | ```bash |
| | # Use the quantization tag you prefer: |
| | ollama run hf.co/ThalisAI/Qwen2.5-Coder-32B-Instruct-heretic:Q8_0 |
| | ``` |
| |
|
| | ## bf16 Weights |
| |
|
| | The full bf16 abliterated weights are available in the `bf16/` subdirectory of this repository. |
| |
|
| | ## Usage with Transformers |
| |
|
| | The bf16 weights in the `bf16/` subdirectory can be loaded directly with Transformers: |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | |
| | model_id = "ThalisAI/Qwen2.5-Coder-32B-Instruct-heretic" |
| | tokenizer = AutoTokenizer.from_pretrained(model_id, subfolder="bf16") |
| | model = AutoModelForCausalLM.from_pretrained( |
| | model_id, subfolder="bf16", torch_dtype="auto", device_map="auto" |
| | ) |
| | |
| | messages = [{"role": "user", "content": "Hello!"}] |
| | text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False) |
| | inputs = tokenizer(text, return_tensors="pt").to(model.device) |
| | outputs = model.generate(**inputs, max_new_tokens=512) |
| | print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)) |
| | ``` |
| |
|
| | ## About |
| |
|
| | This model was processed by the **Apostate** automated abliteration pipeline: |
| | 1. The source model was loaded in bf16 |
| | 2. Heretic's optimization-based abliteration was applied to remove refusal behavior |
| | 3. The merged model was converted to GGUF format using llama.cpp |
| | 4. Multiple quantization levels were generated |
| |
|
| | The abliteration process uses directional ablation to remove the model's refusal directions |
| | while minimizing KL divergence from the original model's behavior on harmless prompts. |
| |
|