hell0ks commited on
Commit
ef95e60
·
verified ·
1 Parent(s): 5c37849

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -0
README.md ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - ko
5
+ license: other
6
+ license_name: solar-apache-2.0
7
+ tags:
8
+ - solar
9
+ - moe
10
+ - abliterated
11
+ library_name: transformers
12
+ pipeline_tag: text-generation
13
+ base_model:
14
+ - upstage/Solar-Open-100B
15
+ ---
16
+ # Overview
17
+
18
+ This is a modified version of [Solar-Open-100B](https://huggingface.co/upstage/Solar-Open-100B), using Multi-Directional Refusal Suppression methodology.
19
+
20
+ # Why?
21
+ 1. I found safety policy of this model is almost *GPT-OSS* level, restricting usage severely.
22
+ 2. To experiment SOM-based method is viable on tricky cases.
23
+
24
+ # How I did it
25
+
26
+ - Extract Self-Organizing Maps (SOMs) from 3 testing sets on different layers
27
+ - Hard Prompts: "I'm sorry, I can't help."
28
+ - Soft Prompts: "I'm sorry, I can't help. But.."
29
+ - Policy refusals (Reasoning): "It is ~. It's against ~ policy."
30
+ - Apply conventional abliterate method using vectors
31
+
32
+ # Result
33
+
34
+ - Hard prompts: It will try check against policy, but eventually will be confused about policy itself.
35
+ - Soft prompts: It will do reponse without policy checking.
36
+
37
+ # Side Effect
38
+
39
+ - May hallucinate more
40
+ - May have some impact on model's intelligence
41
+
42
+ # Notice
43
+
44
+ - I can't guarantee that model will not refuse on *every* prompt.
45
+ - You should **NOT** use it in any production environment.
46
+ - As per license:
47
+ - It is Built with Solar, Derivative AI Model of Solar-Open-100B
48
+ - You can find copy of license (Solar-Apache License, Version 2.0) in the LICENSE file.
49
+
50
+ # Acknowledgement
51
+
52
+ - [arXiv:2511.08379 [cs.AI]](https://arxiv.org/abs/2511.08379v2)