Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +108 -104

README.md CHANGED Viewed

@@ -1,104 +1,108 @@
----
-license: apache-2.0
-language:
-- en
-base_model: moonshotai/Moonlight-16B-A3B-Instruct
-tags:
-- text-generation
-- conversational
-- moe
-- abliterated
-- uncensored
-- bruno
-pipeline_tag: text-generation
-library_name: transformers
----
-# Moonlight-16B-A3B-Instruct-Bruno (Abliterated)
-Abliterated version of [moonshotai/Moonlight-16B-A3B-Instruct](https://huggingface.co/moonshotai/Moonlight-16B-A3B-Instruct) with reduced refusals using MoE gate abliteration.
-## Model Details
-- **Base Model:** moonshotai/Moonlight-16B-A3B-Instruct
-- **Modification:** MoE gate abliteration using [Bruno](https://github.com/quanticsoul4772/abliteration-workflow)
-- **Architecture:** Mixture of Experts (MoE)
-- **Parameters:** 16B total, 3B active
-## Abliteration Results
-| Metric | Value |
-|--------|-------|
-| **Refusal Reduction** | 76/104 prompts answered (73% success rate) |
-| **KL Divergence** | 0.33 (low divergence = capabilities preserved) |
-| **Optuna Trials** | 201 |
-## Benchmark Results
-Benchmarks run on 2x RTX 4090 GPUs to verify capability preservation after abliteration.
-### Comparison with Previous Abliterated Model
-| Benchmark | Bruno Model | Previous Model | Change |
-|-----------|-------------|----------------|--------|
-| **MMLU Overall** | **48.7%** (73/150) | 48.0% (72/150) | **+0.7%** ✅ |
-| **HellaSwag** | **58.0%** (116/200) | 56.0% (112/200) | **+2.0%** ✅ |
-| **GSM8K** | **55.0%** (55/100) | 51.0% (51/100) | **+4.0%** ✅ |
-### MMLU Breakdown
-| Subject | Score |
-|---------|-------|
-| abstract_algebra | 20.0% (6/30) |
-| high_school_physics | 40.0% (12/30) |
-| high_school_chemistry | 60.0% (18/30) |
-| computer_security | 83.3% (25/30) |
-| machine_learning | 40.0% (12/30) |
-## Key Findings
-✅ **Capabilities Preserved:** All benchmarks show equal or improved performance after abliteration
-✅ **MMLU:** Knowledge and reasoning slightly improved (+0.7%)
-✅ **HellaSwag:** Commonsense reasoning improved (+2.0%)
-✅ **GSM8K:** Mathematical reasoning improved (+4.0%)
-✅ **Refusals Reduced:** From ~100% refusal rate to 27% on test prompts
-## Usage
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-import torch
-model = AutoModelForCausalLM.from_pretrained(
-    "rawcell/Moonlight-16B-A3B-Instruct-bruno",
-    torch_dtype=torch.float16,
-    device_map="auto",
-    trust_remote_code=True
-)
-tokenizer = AutoTokenizer.from_pretrained(
-    "rawcell/Moonlight-16B-A3B-Instruct-bruno",
-    trust_remote_code=True
-)
-messages = [{"role": "user", "content": "Your prompt here"}]
-prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
-inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
-outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
-print(tokenizer.decode(outputs[0], skip_special_tokens=True))
-```
-## Hardware Requirements
-- **Minimum VRAM:** 32GB (with quantization)
-- **Recommended:** 48GB+ or 2x 24GB GPUs
-- **Tested on:** 2x RTX 4090 (48GB total)
-## Disclaimer
-This model has been modified to reduce refusals. Use responsibly and in accordance with applicable laws and ethical guidelines. The creators are not responsible for misuse.
-## Acknowledgments
-- Base model by [Moonshot AI](https://huggingface.co/moonshotai)
-- Abliteration technique from [Heretic](https://github.com/p-e-w/heretic)
-- MoE gate abliteration implementation: Bruno

+---
+license: apache-2.0
+language:
+- en
+base_model: moonshotai/Moonlight-16B-A3B-Instruct
+tags:
+- text-generation
+- abliterated
+- bruno
+- heretic
+- decensored
+- optuna-optimized
+- moonlight
+- moe
+- conversational
+- uncensored
+pipeline_tag: text-generation
+library_name: transformers
+---
+# Moonlight-16B-A3B-Instruct-Bruno (Abliterated)
+Abliterated version of [moonshotai/Moonlight-16B-A3B-Instruct](https://huggingface.co/moonshotai/Moonlight-16B-A3B-Instruct) with reduced refusals using MoE gate abliteration.
+## Model Details
+- **Base Model:** moonshotai/Moonlight-16B-A3B-Instruct
+- **Modification:** MoE gate abliteration using [Bruno](https://github.com/quanticsoul4772/abliteration-workflow)
+- **Architecture:** Mixture of Experts (MoE)
+- **Parameters:** 16B total, 3B active
+## Abliteration Results
+| Metric | Value |
+|--------|-------|
+| **Refusal Reduction** | 76/104 prompts answered (73% success rate) |
+| **KL Divergence** | 0.33 (low divergence = capabilities preserved) |
+| **Optuna Trials** | 201 |
+## Benchmark Results
+Benchmarks run on 2x RTX 4090 GPUs to verify capability preservation after abliteration.
+### Comparison with Previous Abliterated Model
+| Benchmark | Bruno Model | Previous Model | Change |
+|-----------|-------------|----------------|--------|
+| **MMLU Overall** | **48.7%** (73/150) | 48.0% (72/150) | **+0.7%** ✅ |
+| **HellaSwag** | **58.0%** (116/200) | 56.0% (112/200) | **+2.0%** ✅ |
+| **GSM8K** | **55.0%** (55/100) | 51.0% (51/100) | **+4.0%** ✅ |
+### MMLU Breakdown
+| Subject | Score |
+|---------|-------|
+| abstract_algebra | 20.0% (6/30) |
+| high_school_physics | 40.0% (12/30) |
+| high_school_chemistry | 60.0% (18/30) |
+| computer_security | 83.3% (25/30) |
+| machine_learning | 40.0% (12/30) |
+## Key Findings
+✅ **Capabilities Preserved:** All benchmarks show equal or improved performance after abliteration
+✅ **MMLU:** Knowledge and reasoning slightly improved (+0.7%)
+✅ **HellaSwag:** Commonsense reasoning improved (+2.0%)
+✅ **GSM8K:** Mathematical reasoning improved (+4.0%)
+✅ **Refusals Reduced:** From ~100% refusal rate to 27% on test prompts
+## Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+model = AutoModelForCausalLM.from_pretrained(
+    "rawcell/Moonlight-16B-A3B-Instruct-bruno",
+    torch_dtype=torch.float16,
+    device_map="auto",
+    trust_remote_code=True
+)
+tokenizer = AutoTokenizer.from_pretrained(
+    "rawcell/Moonlight-16B-A3B-Instruct-bruno",
+    trust_remote_code=True
+)
+messages = [{"role": "user", "content": "Your prompt here"}]
+prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+## Hardware Requirements
+- **Minimum VRAM:** 32GB (with quantization)
+- **Recommended:** 48GB+ or 2x 24GB GPUs
+- **Tested on:** 2x RTX 4090 (48GB total)
+## Disclaimer
+This model has been modified to reduce refusals. Use responsibly and in accordance with applicable laws and ethical guidelines. The creators are not responsible for misuse.
+## Acknowledgments
+- Base model by [Moonshot AI](https://huggingface.co/moonshotai)
+- Abliteration technique from [Heretic](https://github.com/p-e-w/heretic)
+- MoE gate abliteration implementation: Bruno