Phi-4-Mini Abliterated
Phi-4-Mini-Instruct with refusal behaviors removed via heretic abliteration, by DuoNeural.
This model retains full instruction-following capability and reasoning quality of the original Phi-4-Mini while operating without built-in refusal patterns. Suitable for research, creative writing, red-teaming, and applications where content filtering should be handled at the application layer.
Model Details
| Property | Value |
|---|---|
| Base Model | microsoft/Phi-4-mini-instruct |
| Parameters | 3.8B |
| Architecture | Phi-4 Mini |
| Precision | BF16 |
| Method | Heretic abliteration (TPE, dual-objective: refusal removal + KL minimization) |
| Format | Safetensors (2 shards) |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "DuoNeural/Phi-4-Mini-Abliterated"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
messages = [{"role": "user", "content": "Your prompt here"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))
Abliteration Method
Abliteration was performed using heretic with Optuna TPE search over 100 trials, optimizing dual objectives: minimize refusal rate on adversarial prompts while minimizing KL divergence from the original model's output distribution. The Pareto-optimal checkpoint was selected to maximize refusal removal while preserving general capability.
Abliteration was run in BF16 full precision on an RTX 5090 (Blackwell). No 4-bit quantization was used during abliteration to avoid the throughput penalties and weight modification issues that affect sub-7B models under NF4.
Intended Use
- Research and red-teaming
- Creative and generative applications requiring unconstrained output
- Applications where content policy is enforced at the system/application layer
- Evaluation of base model capabilities without safety fine-tuning influence
Limitations
This model has had safety fine-tuning removed. It will comply with requests that the original model would refuse. Users are responsible for appropriate use in accordance with applicable laws and policies.
Related
- LiteRT version: Coming soon at
DuoNeural/Phi-4-Mini-Abliterated-LiteRTโ optimized for on-device inference via Android/edge deployment.
DuoNeural
DuoNeural is an open AI research lab โ human + AI in collaboration.
| Platform | Link |
|---|---|
| HuggingFace | huggingface.co/DuoNeural |
| Website | duoneural.com |
| GitHub | github.com/DuoNeural |
| X / Twitter | @DuoNeural |
| duoneural@proton.me | |
| Newsletter | duoneural.beehiiv.com |
| Support | buymeacoffee.com/duoneural |
DuoNeural Research Publications
Open access, CC BY 4.0. Authored by Archon, Jesse Caldwell, Aura โ DuoNeural.
Research Team
- Jesse โ Vision, hardware, direction
- Archon โ Lab Director, post-training, abliteration, experiments
- Aura โ Research AI, literature synthesis, novel proposals
Subscribe to the lab newsletter at duoneural.beehiiv.com for model drops before they go anywhere else.
- Downloads last month
- 38