ThalisAI
/

Qwen2.5-Coder-32B-Instruct-heretic

Model card Files Files and versions

Qwen2.5-Coder-32B-Instruct-heretic / README.md

ThalisAI's picture

Add Usage with Transformers section to README

21502b5 verified 19 days ago

|

history blame contribute delete

2.58 kB

	---
	tags:
	- heretic
	- uncensored
	- abliterated
	- gguf
	license: other
	base_model: Qwen/Qwen2.5-Coder-32B-Instruct
	---

	# Qwen2.5-Coder-32B-Instruct-heretic

	Abliterated (uncensored) version of [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct),
	created using [Heretic](https://github.com/p-e-w/heretic) and converted to GGUF.

	## Abliteration Quality

	\| Metric \| Value \|
	\|:-------\|------:\|
	\| Refusals \| 4/100 \|
	\| KL Divergence \| 0.0728 \|
	\| Rounds \| 2 \|

	Lower refusals = fewer refused prompts. Lower KL divergence = closer to original model behavior.

	## Available Quantizations

	\| Quantization \| File \| Size \|
	\|:-------------\|:-----\|-----:\|
	\| Q8_0 \| [Qwen2.5-Coder-32B-Instruct-heretic-Q8_0.gguf](./Qwen2.5-Coder-32B-Instruct-heretic-Q8_0.gguf) \| 32.43 GB \|
	\| Q6_K \| [Qwen2.5-Coder-32B-Instruct-heretic-Q6_K.gguf](./Qwen2.5-Coder-32B-Instruct-heretic-Q6_K.gguf) \| 25.04 GB \|
	\| Q4_K_M \| [Qwen2.5-Coder-32B-Instruct-heretic-Q4_K_M.gguf](./Qwen2.5-Coder-32B-Instruct-heretic-Q4_K_M.gguf) \| 18.49 GB \|

	## Usage with Ollama

	```bash
	# Use the quantization tag you prefer:
	ollama run hf.co/ThalisAI/Qwen2.5-Coder-32B-Instruct-heretic:Q8_0
	```

	## bf16 Weights

	The full bf16 abliterated weights are available in the `bf16/` subdirectory of this repository.

	## Usage with Transformers

	The bf16 weights in the `bf16/` subdirectory can be loaded directly with Transformers:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "ThalisAI/Qwen2.5-Coder-32B-Instruct-heretic"
	tokenizer = AutoTokenizer.from_pretrained(model_id, subfolder="bf16")
	model = AutoModelForCausalLM.from_pretrained(
	model_id, subfolder="bf16", torch_dtype="auto", device_map="auto"
	)

	messages = [{"role": "user", "content": "Hello!"}]
	text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
	inputs = tokenizer(text, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=512)
	print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
	```

	## About

	This model was processed by the Apostate automated abliteration pipeline:
	1. The source model was loaded in bf16
	2. Heretic's optimization-based abliteration was applied to remove refusal behavior
	3. The merged model was converted to GGUF format using llama.cpp
	4. Multiple quantization levels were generated

	The abliteration process uses directional ablation to remove the model's refusal directions
	while minimizing KL divergence from the original model's behavior on harmless prompts.