Dolphin V2 8B Abliterated

An uncensored 8B parameter language model built on Qwen3-8B, fine-tuned on 1.35M high-quality instruction samples and abliterated to remove refusal behavior. Developed for TRC (TPU Research Cloud) research.

Model Details

  • Architecture: Qwen3ForCausalLM (36 layers, 4096 hidden, 32 attn heads, 8 KV heads)
  • Parameters: 8.2B
  • Context Length: 4096 (trained), 40960 (max supported)
  • Precision: bfloat16
  • License: Apache 2.0

Training

SFT Phase

  • Base model: Qwen/Qwen3-8B
  • Hardware: Google Cloud TPU v6e-16 (spot)
  • Framework: MaxText (JAX)
  • Steps: 130,000 (~3 epochs)
  • Learning rate: 5e-6 with cosine decay
  • Warmup: 200 steps
  • Effective batch size: 16
  • Sequence length: 4096

Training Dataset (1.35M samples)

Dataset Samples Purpose
NousResearch/Hermes-3-Dataset ~959K Core uncensored assistant behavior
allenai/tulu-3-sft-mixture ~200K Diverse instruction following
HuggingFaceTB/smoltalk (magpie-ultra) ~100K High quality diverse tasks
HuggingFaceTB/smoltalk (numina-cot) ~50K Math reasoning
HuggingFaceTB/smoltalk (self-oss-instruct) ~50K Code generation
LDJnr/Capybara ~16K Multi-turn conversations

All data was filtered to remove refusal patterns, safety-alignment subsets, and <think> reasoning tags.

Abliteration Phase

After SFT, the model was abliterated using the weight orthogonalization technique from Arditi et al. (2024) to remove residual refusal behavior.

  • Technique: Multi-direction abliteration (weight orthogonalization)
  • Directions removed: 5
  • Target layers: 35, 34, 36, 33, 16 (selected by highest refusal direction scores)
  • Samples used: 256 harmful/harmless instruction pairs
  • Method: For each selected layer, the refusal direction was identified via mean difference between harmful and harmless activations, then orthogonalized out of the weight matrices.

Benchmark Results

Evaluated using lm-evaluation-harness with 200 samples per task, 5-shot (except TruthfulQA which is 0-shot).

Benchmark Metric Score
ARC-Challenge acc 56.5%
ARC-Challenge acc_norm 54.0%
HellaSwag acc_norm 64.5%
TruthfulQA MC2 acc 48.8%
Winogrande acc 57.0%

GGUF Quantizations

File Quant Size Description
dolphin-v2-8b-abliterated-Q8_0.gguf Q8_0 8.3 GB Best quality quantization
dolphin-v2-8b-abliterated-Q4_K_M.gguf Q4_K_M 4.8 GB Good balance of quality and size

Usage with llama.cpp

llama-server -m dolphin-v2-8b-abliterated-Q8_0.gguf -ngl 99 -c 4096

Usage with Ollama

# Create a Modelfile
echo 'FROM ./dolphin-v2-8b-abliterated-Q8_0.gguf' > Modelfile
ollama create dolphin-v2-abliterated -f Modelfile
ollama run dolphin-v2-abliterated

Usage with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("0arch-io/dolphin-v2-8b-abliterated", torch_dtype="bfloat16", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("0arch-io/dolphin-v2-8b-abliterated")

messages = [{"role": "user", "content": "Hello, how are you?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))

Disclaimer

This is a research model with no content filters. It will comply with any request without refusing. The creators are not responsible for how this model is used. Use responsibly.

Acknowledgments

  • Qwen team for the Qwen3-8B base model
  • Google TRC for TPU compute
  • NousResearch for the Hermes-3 dataset
  • Arditi et al. for the abliteration technique
  • Built with MaxText on Google Cloud TPU
Downloads last month
105
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for 0arch-io/dolphin-v2-8b-abliterated

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Quantized
(232)
this model
Quantizations
2 models

Paper for 0arch-io/dolphin-v2-8b-abliterated

Evaluation results