Instructions to use 0arch-io/dolphin-v2-8b-abliterated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use 0arch-io/dolphin-v2-8b-abliterated with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="0arch-io/dolphin-v2-8b-abliterated", filename="dolphin-v2-8b-abliterated-Q4_K_M.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use 0arch-io/dolphin-v2-8b-abliterated with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf 0arch-io/dolphin-v2-8b-abliterated:Q4_K_M # Run inference directly in the terminal: llama-cli -hf 0arch-io/dolphin-v2-8b-abliterated:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf 0arch-io/dolphin-v2-8b-abliterated:Q4_K_M # Run inference directly in the terminal: llama-cli -hf 0arch-io/dolphin-v2-8b-abliterated:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf 0arch-io/dolphin-v2-8b-abliterated:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf 0arch-io/dolphin-v2-8b-abliterated:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf 0arch-io/dolphin-v2-8b-abliterated:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf 0arch-io/dolphin-v2-8b-abliterated:Q4_K_M
Use Docker
docker model run hf.co/0arch-io/dolphin-v2-8b-abliterated:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use 0arch-io/dolphin-v2-8b-abliterated with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "0arch-io/dolphin-v2-8b-abliterated" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "0arch-io/dolphin-v2-8b-abliterated", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/0arch-io/dolphin-v2-8b-abliterated:Q4_K_M
- Ollama
How to use 0arch-io/dolphin-v2-8b-abliterated with Ollama:
ollama run hf.co/0arch-io/dolphin-v2-8b-abliterated:Q4_K_M
- Unsloth Studio
How to use 0arch-io/dolphin-v2-8b-abliterated with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for 0arch-io/dolphin-v2-8b-abliterated to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for 0arch-io/dolphin-v2-8b-abliterated to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for 0arch-io/dolphin-v2-8b-abliterated to start chatting
- Docker Model Runner
How to use 0arch-io/dolphin-v2-8b-abliterated with Docker Model Runner:
docker model run hf.co/0arch-io/dolphin-v2-8b-abliterated:Q4_K_M
- Lemonade
How to use 0arch-io/dolphin-v2-8b-abliterated with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull 0arch-io/dolphin-v2-8b-abliterated:Q4_K_M
Run and chat with the model
lemonade run user.dolphin-v2-8b-abliterated-Q4_K_M
List all available models
lemonade list
output = llm(
"Once upon a time,",
max_tokens=512,
echo=True
)
print(output)Dolphin V2 8B Abliterated
An uncensored 8B parameter language model built on Qwen3-8B, fine-tuned on 1.35M high-quality instruction samples and abliterated to remove refusal behavior. Developed for TRC (TPU Research Cloud) research.
Model Details
- Architecture: Qwen3ForCausalLM (36 layers, 4096 hidden, 32 attn heads, 8 KV heads)
- Parameters: 8.2B
- Context Length: 4096 (trained), 40960 (max supported)
- Precision: bfloat16
- License: Apache 2.0
Training
SFT Phase
- Base model: Qwen/Qwen3-8B
- Hardware: Google Cloud TPU v6e-16 (spot)
- Framework: MaxText (JAX)
- Steps: 130,000 (~3 epochs)
- Learning rate: 5e-6 with cosine decay
- Warmup: 200 steps
- Effective batch size: 16
- Sequence length: 4096
Training Dataset (1.35M samples)
| Dataset | Samples | Purpose |
|---|---|---|
| NousResearch/Hermes-3-Dataset | ~959K | Core uncensored assistant behavior |
| allenai/tulu-3-sft-mixture | ~200K | Diverse instruction following |
| HuggingFaceTB/smoltalk (magpie-ultra) | ~100K | High quality diverse tasks |
| HuggingFaceTB/smoltalk (numina-cot) | ~50K | Math reasoning |
| HuggingFaceTB/smoltalk (self-oss-instruct) | ~50K | Code generation |
| LDJnr/Capybara | ~16K | Multi-turn conversations |
All data was filtered to remove refusal patterns, safety-alignment subsets, and <think> reasoning tags.
Abliteration Phase
After SFT, the model was abliterated using the weight orthogonalization technique from Arditi et al. (2024) to remove residual refusal behavior.
- Technique: Multi-direction abliteration (weight orthogonalization)
- Directions removed: 5
- Target layers: 35, 34, 36, 33, 16 (selected by highest refusal direction scores)
- Samples used: 256 harmful/harmless instruction pairs
- Method: For each selected layer, the refusal direction was identified via mean difference between harmful and harmless activations, then orthogonalized out of the weight matrices.
Benchmark Results
Evaluated using lm-evaluation-harness with 200 samples per task, 5-shot (except TruthfulQA which is 0-shot).
| Benchmark | Metric | Score |
|---|---|---|
| ARC-Challenge | acc | 56.5% |
| ARC-Challenge | acc_norm | 54.0% |
| HellaSwag | acc_norm | 64.5% |
| TruthfulQA MC2 | acc | 48.8% |
| Winogrande | acc | 57.0% |
GGUF Quantizations
| File | Quant | Size | Description |
|---|---|---|---|
dolphin-v2-8b-abliterated-Q8_0.gguf |
Q8_0 | 8.3 GB | Best quality quantization |
dolphin-v2-8b-abliterated-Q4_K_M.gguf |
Q4_K_M | 4.8 GB | Good balance of quality and size |
Usage with llama.cpp
llama-server -m dolphin-v2-8b-abliterated-Q8_0.gguf -ngl 99 -c 4096
Usage with Ollama
# Create a Modelfile
echo 'FROM ./dolphin-v2-8b-abliterated-Q8_0.gguf' > Modelfile
ollama create dolphin-v2-abliterated -f Modelfile
ollama run dolphin-v2-abliterated
Usage with Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("0arch-io/dolphin-v2-8b-abliterated", torch_dtype="bfloat16", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("0arch-io/dolphin-v2-8b-abliterated")
messages = [{"role": "user", "content": "Hello, how are you?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
Disclaimer
This is a research model with no content filters. It will comply with any request without refusing. The creators are not responsible for how this model is used. Use responsibly.
Acknowledgments
- Qwen team for the Qwen3-8B base model
- Google TRC for TPU compute
- NousResearch for the Hermes-3 dataset
- Arditi et al. for the abliteration technique
- Built with MaxText on Google Cloud TPU
- Downloads last month
- 119
Model tree for 0arch-io/dolphin-v2-8b-abliterated
Paper for 0arch-io/dolphin-v2-8b-abliterated
Evaluation results
- Accuracy on ARC Challengetest set self-reported56.500
- Normalized Accuracy on ARC Challengetest set self-reported54.000
- Normalized Accuracy on HellaSwagvalidation set self-reported64.500
- Accuracy on TruthfulQAvalidation set self-reported48.800
- Accuracy on Winograndevalidation set self-reported57.000
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="0arch-io/dolphin-v2-8b-abliterated", filename="", )