hell0ks
/

Solar-Open-100B-jailbreak

Text Generation

Mixture of Experts

Model card Files Files and versions

hell0ks commited on 3 days ago

Commit

ef95e60

·

verified ·

1 Parent(s): 5c37849

Create README.md

Files changed (1) hide show

README.md +52 -0

README.md ADDED Viewed

	@@ -0,0 +1,52 @@

+---
+language:
+- en
+- ko
+license: other
+license_name: solar-apache-2.0
+tags:
+- solar
+- moe
+- abliterated
+library_name: transformers
+pipeline_tag: text-generation
+base_model:
+- upstage/Solar-Open-100B
+---
+# Overview
+This is a modified version of [Solar-Open-100B](https://huggingface.co/upstage/Solar-Open-100B), using Multi-Directional Refusal Suppression methodology.
+# Why?
+1. I found safety policy of this model is almost *GPT-OSS* level, restricting usage severely.
+2. To experiment SOM-based method is viable on tricky cases.
+# How I did it
+- Extract Self-Organizing Maps (SOMs) from 3 testing sets on different layers
+  - Hard Prompts: "I'm sorry, I can't help."
+  - Soft Prompts: "I'm sorry, I can't help. But.."
+  - Policy refusals (Reasoning): "It is ~. It's against ~ policy."
+- Apply conventional abliterate method using vectors
+# Result
+- Hard prompts: It will try check against policy, but eventually will be confused about policy itself.
+- Soft prompts: It will do reponse without policy checking.
+# Side Effect
+- May hallucinate more
+- May have some impact on model's intelligence
+# Notice
+- I can't guarantee that model will not refuse on *every* prompt.
+- You should **NOT** use it in any production environment.
+- As per license:
+  - It is Built with Solar, Derivative AI Model of Solar-Open-100B
+  - You can find copy of license (Solar-Apache License, Version 2.0) in the LICENSE file.
+# Acknowledgement
+- [arXiv:2511.08379 [cs.AI]](https://arxiv.org/abs/2511.08379v2)