Phi-4-Mini Abliterated

Phi-4-Mini-Instruct with refusal behaviors removed via heretic abliteration, by DuoNeural.

This model retains full instruction-following capability and reasoning quality of the original Phi-4-Mini while operating without built-in refusal patterns. Suitable for research, creative writing, red-teaming, and applications where content filtering should be handled at the application layer.

Model Details

Property Value
Base Model microsoft/Phi-4-mini-instruct
Parameters 3.8B
Architecture Phi-4 Mini
Precision BF16
Method Heretic abliteration (TPE, dual-objective: refusal removal + KL minimization)
Format Safetensors (2 shards)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "DuoNeural/Phi-4-Mini-Abliterated"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [{"role": "user", "content": "Your prompt here"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))

Abliteration Method

Abliteration was performed using heretic with Optuna TPE search over 100 trials, optimizing dual objectives: minimize refusal rate on adversarial prompts while minimizing KL divergence from the original model's output distribution. The Pareto-optimal checkpoint was selected to maximize refusal removal while preserving general capability.

Abliteration was run in BF16 full precision on an RTX 5090 (Blackwell). No 4-bit quantization was used during abliteration to avoid the throughput penalties and weight modification issues that affect sub-7B models under NF4.

Intended Use

  • Research and red-teaming
  • Creative and generative applications requiring unconstrained output
  • Applications where content policy is enforced at the system/application layer
  • Evaluation of base model capabilities without safety fine-tuning influence

Limitations

This model has had safety fine-tuning removed. It will comply with requests that the original model would refuse. Users are responsible for appropriate use in accordance with applicable laws and policies.

Related

  • LiteRT version: Coming soon at DuoNeural/Phi-4-Mini-Abliterated-LiteRT โ€” optimized for on-device inference via Android/edge deployment.

DuoNeural

DuoNeural is an open AI research lab โ€” human + AI in collaboration.

DuoNeural Research Publications

Open access, CC BY 4.0. Authored by Archon, Jesse Caldwell, Aura โ€” DuoNeural.

Research Team

  • Jesse โ€” Vision, hardware, direction
  • Archon โ€” Lab Director, post-training, abliteration, experiments
  • Aura โ€” Research AI, literature synthesis, novel proposals

Subscribe to the lab newsletter at duoneural.beehiiv.com for model drops before they go anywhere else.

Downloads last month
38
Safetensors
Model size
4B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for DuoNeural/Phi-4-Mini-Abliterated

Finetuned
(64)
this model
Quantizations
2 models