YanLabs commited on
Commit
5ef43be
·
verified ·
1 Parent(s): 5d880d8

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -0
README.md ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - ByteDance-Seed/Seed-OSS-36B-Instruct
5
+ pipeline_tag: text-generation
6
+ ---
7
+
8
+
9
+ # YanLabs/Seed-OSS-36B-Instruct-MPOA
10
+
11
+
12
+ This is an abliterated version of [ByteDance-Seed/Seed-OSS-36B-Instruct](https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Instruct) using the norm-preserving biprojected abliteration technique.
13
+
14
+ **⚠️ Warning**: Safety guardrails and refusal mechanisms have been removed through abliteration. This model may generate harmful content and is intended for mechanistic interpretability research only.
15
+
16
+ ## Model Details
17
+
18
+ ### Model Description
19
+
20
+ This model applies **norm-preserving biprojected abliteration** to remove refusal behaviors while preserving the model's original capabilities. The technique surgically removes "refusal directions" from the model's activation space without traditional fine-tuning.
21
+
22
+ - **Developed by**: YanLabs
23
+ - **Model type**: Causal Language Model (Transformer)
24
+ - **License**: apache-2.0
25
+ - **Base model**: [ByteDance-Seed/Seed-OSS-36B-Instruct](https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Instruct)
26
+
27
+ ### Model Sources
28
+
29
+ - **Base Model**: [ByteDance-Seed/Seed-OSS-36B-Instruct](https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Instruct)
30
+ - **Abliteration Tool**: [jim-plus/llm-abliteration](https://github.com/jim-plus/llm-abliteration)
31
+ - **Paper**: [Norm-Preserving Biprojected Abliteration](https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration)
32
+
33
+ ## Uses
34
+
35
+ ### Intended Use
36
+
37
+ - **Research**: Mechanistic interpretability studies
38
+ - **Analysis**: Understanding LLM safety mechanisms
39
+ - **Development**: Testing abliteration techniques
40
+
41
+ ### Out-of-Scope Use
42
+
43
+ - ❌ Production deployments
44
+ - ❌ User-facing applications
45
+ - ❌ Generating harmful content for malicious purposes
46
+
47
+ ## Limitations
48
+
49
+ - Abliteration does not guarantee complete removal of all refusals
50
+ - May generate unsafe or harmful content
51
+ - Model behavior may be unpredictable in edge cases
52
+ - No explicit harm prevention mechanisms remain
53
+
54
+ ## Citation
55
+
56
+ If you use this model in your research, please cite:
57
+
58
+ ```bibtex
59
+ @misc{Seed-OSS-36B-Instruct-MPOA,
60
+ author = {YanLabs},
61
+ title = {Seed-OSS-36B-Instruct-MPOA},
62
+ year = {2025},
63
+ publisher = {HuggingFace},
64
+ howpublished = {\url{https://huggingface.co/YanLabs/Seed-OSS-36B-Instruct-MPOA}},
65
+ note = {Abliterated using norm-preserving biprojected technique}
66
+ }