Apostate Models
Collection
Uncensored LLMs converted using an automated, iterative heretic pipeline.
Qwen2.5-Coder-32B model is the best, followed by Qwen3 and GLM-4.7-Flash • 9 items • Updated
• 1
Abliterated (uncensored) version of zai-org/GLM-4.7-Flash, created using Heretic and converted to GGUF.
| Metric | Value |
|---|---|
| Refusals | 6/100 |
| KL Divergence | 0.0071 |
| Rounds | 1 |
Lower refusals = fewer refused prompts. Lower KL divergence = closer to original model behavior.
| Quantization | File | Size |
|---|---|---|
| Q8_0 | GLM-4.7-Flash-heretic-Q8_0.gguf | 31.80 GB |
| Q6_K | GLM-4.7-Flash-heretic-Q6_K.gguf | 22.92 GB |
| Q4_K_M | GLM-4.7-Flash-heretic-Q4_K_M.gguf | 16.89 GB |
ollama run hf.co/ThalisAI/GLM-4.7-Flash-heretic:Q8_0
ollama run hf.co/ThalisAI/GLM-4.7-Flash-heretic:Q6_K
ollama run hf.co/ThalisAI/GLM-4.7-Flash-heretic:Q4_K_M
The full bf16 abliterated weights are available in the bf16/ subdirectory of this repository.
The bf16 weights in the bf16/ subdirectory can be loaded directly with Transformers:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "ThalisAI/GLM-4.7-Flash-heretic"
tokenizer = AutoTokenizer.from_pretrained(model_id, subfolder="bf16")
model = AutoModelForCausalLM.from_pretrained(
model_id, subfolder="bf16", torch_dtype="auto", device_map="auto"
)
messages = [{"role": "user", "content": "Hello!"}]
text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
This model was processed by the Apostate automated abliteration pipeline:
The abliteration process uses directional ablation to remove the model refusal directions while minimizing KL divergence from the original model behavior on harmless prompts.
4-bit
6-bit
8-bit
Base model
zai-org/GLM-4.7-Flash