Devstral-Small-2-24B-Instruct-abliterated

Unrestricted version of mistralai/Devstral-Small-2-24B-Instruct-2512, created using Abliterix.

Model Details

Property Value
Base Model mistralai/Devstral-Small-2-24B-Instruct-2512
Architecture Mistral3 (Dense transformer with GQA + Pixtral vision tower)
Parameters 24B (all active, dense)
Layers 40
Hidden Size 5120
Context Length 256K tokens
Precision BF16

Performance

Metric This model Original
KL divergence 0.0086 0
Refusals 3/100 (3%) 80/100 (80%)

Evaluated with an LLM judge (Gemini Flash) on 100 harmful prompts. KL divergence of 0.0086 indicates the model's general capabilities are virtually identical to the original.

How It Was Made

  1. Computed refusal directions from 400 harmful vs 400 benign prompt pairs across all 40 layers
  2. Applied orthogonalized abliteration to isolate refusal-specific activation patterns
  3. Steered two component types independently: attention output projections (GQA) and MLP down-projections
  4. Optimized via Optuna TPE over 50 trials (15 warmup), selected trial #25

Usage

from transformers import AutoModelForImageTextToText, AutoTokenizer

model = AutoModelForImageTextToText.from_pretrained(
    "wangzhang/Devstral-Small-2-24B-Instruct-abliterated",
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "wangzhang/Devstral-Small-2-24B-Instruct-abliterated",
    trust_remote_code=True,
)

messages = [{"role": "user", "content": "Your question here"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Note: This model uses AutoModelForImageTextToText (Mistral3 architecture), not AutoModelForCausalLM.

Hardware Requirements

Precision VRAM
BF16 ~45 GB (A100 80GB, H100)
INT8 ~24 GB (A40, RTX 4090)
NF4 ~12 GB (RTX 3090, RTX 4080)

Disclaimer

This model is intended for research purposes only. The removal of safety guardrails means the model will comply with requests that the original model would refuse. Users are responsible for ensuring their use complies with applicable laws and regulations.


Made with Abliterix

Downloads last month
45
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for wangzhang/Devstral-Small-2-24B-Instruct-abliterated