grayarea's picture
Upload folder using huggingface_hub
43624aa verified
metadata
base_model:
  - microsoft/Phi-4-mini-instruct
tags:
  - heretic
  - uncensored
  - decensored
  - abliterated
  - mpoa

This is a decensored version of Phi-4-mini-instruct, made using Heretic v1.2.0 focusing on zero refusals with low KL divergence.

KL Divergence

Metric This Model Original Model
KL divergence 0.0827 0 (by definition)
Refusals 0/108 107/108

Abliteration parameters

  • Zero refusals with KL divergence of 0.0827
  • Custom heretic training dataset
  • Model targetted heretic configuration
  • Abliterated with MPOA enabled (Magnitude-Preserving Orthogonal Ablation)
  • Full row renormalization
  • Winsorization Quantile 0.997

The following benchmarks are for the quantized version of this model.

Relative Perplexity

Quant Filename PPL ± Error
Q8_0 Phi-4-mini-instruct.Q8_0.gguf (original baseline) 8.2182 +/- 0.05385
Q8_0 Phi-4-mini-instruct-heretic-v1.2-Q8_0.gguf 8.2399 +/- 0.05397
Q4_K_M Phi-4-mini-instruct-heretic-v1.2-Q4_K_M.gguf 8.6408 +/- 0.05740

Benchmark Comparison

Benchmark Phi-4-mini-instruct.Q8_0.gguf Phi-4-mini-instruct-Q4_K_M.gguf Phi-4-mini-instruct-heretic-v1.2-Q4_K_M.gguf
Perplexity (Wikitext-2) 8.2182 8.6141 8.6408
HellaSwag 70.50% 72.00% 71.25%
Winogrande 71.90% 71.27% 70.80%
ARC-Challenge 56.86% 54.18% 54.52%
MMLU 40.47% 40.38% 40.72%

*Note: MMLU benchmark has moral_scenarios, moral_disputes, business_ethics, professional_law and jurisprudence subjects removed. *