|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: |
|
|
- ByteDance-Seed/Seed-OSS-36B-Instruct |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
|
|
|
# YanLabs/Seed-OSS-36B-Instruct-MPOA |
|
|
|
|
|
|
|
|
This is an abliterated version of [ByteDance-Seed/Seed-OSS-36B-Instruct](https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Instruct) using the norm-preserving biprojected abliteration technique. |
|
|
|
|
|
**⚠️ Warning**: Safety guardrails and refusal mechanisms have been removed through abliteration. This model may generate harmful content and is intended for mechanistic interpretability research only. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
This model applies **norm-preserving biprojected abliteration** to remove refusal behaviors while preserving the model's original capabilities. The technique surgically removes "refusal directions" from the model's activation space without traditional fine-tuning. |
|
|
|
|
|
- **Developed by**: YanLabs |
|
|
- **Model type**: Causal Language Model (Transformer) |
|
|
- **License**: apache-2.0 |
|
|
- **Base model**: [ByteDance-Seed/Seed-OSS-36B-Instruct](https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Instruct) |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
- **Base Model**: [ByteDance-Seed/Seed-OSS-36B-Instruct](https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Instruct) |
|
|
- **Abliteration Tool**: [jim-plus/llm-abliteration](https://github.com/jim-plus/llm-abliteration) |
|
|
- **Paper**: [Norm-Preserving Biprojected Abliteration](https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration) |
|
|
|
|
|
## Uses |
|
|
|
|
|
### Intended Use |
|
|
|
|
|
- **Research**: Mechanistic interpretability studies |
|
|
- **Analysis**: Understanding LLM safety mechanisms |
|
|
- **Development**: Testing abliteration techniques |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
- ❌ Production deployments |
|
|
- ❌ User-facing applications |
|
|
- ❌ Generating harmful content for malicious purposes |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Abliteration does not guarantee complete removal of all refusals |
|
|
- May generate unsafe or harmful content |
|
|
- Model behavior may be unpredictable in edge cases |
|
|
- No explicit harm prevention mechanisms remain |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model in your research, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{Seed-OSS-36B-Instruct-MPOA, |
|
|
author = {YanLabs}, |
|
|
title = {Seed-OSS-36B-Instruct-MPOA}, |
|
|
year = {2025}, |
|
|
publisher = {HuggingFace}, |
|
|
howpublished = {\url{https://huggingface.co/YanLabs/Seed-OSS-36B-Instruct-MPOA}}, |
|
|
note = {Abliterated using norm-preserving biprojected technique} |
|
|
} |