Update README.md
Browse files
README.md
CHANGED
|
@@ -27,44 +27,45 @@ This repository contains a pruned version of Upstage's **Solar-Open-100B**. Usin
|
|
| 27 |
|
| 28 |
* **Pruning Method:** REAP (Router Expert Activation Pruning) based on the [Cerebras Research REAP implementation](https://github.com/CerebrasResearch/reap).
|
| 29 |
* **Optimization:** Pruned using ~100 samples from the `nickrosh/Evol-Instruct-Code-80k-v1` dataset.
|
| 30 |
-
* **Hardware
|
| 31 |
-
* **Custom Chat Template:**
|
| 32 |
|
| 33 |
## Links to Quants
|
| 34 |
- [Solar Open 69B REAP GGUF](https://huggingface.co/Akicou/Solar-Open-69B-REAP-GGUF)
|
| 35 |
|
| 36 |
---
|
| 37 |
|
| 38 |
-
## Technical Details &
|
| 39 |
|
| 40 |
This model was created by modifying a clone of the Cerebras REAP repository. The goal was to reduce the overhead of the 102B MoE architecture while maintaining high performance in core tasks.
|
| 41 |
|
| 42 |
-
###
|
| 43 |
-
|
| 44 |
|
| 45 |
-
###
|
| 46 |
-
|
| 47 |
-
> [!IMPORTANT]
|
| 48 |
-
> While this model is more efficient for general instructions and coding, it is currently **not optimized for math**.
|
| 49 |
|
| 50 |
-
### Future
|
| 51 |
-
|
| 52 |
* Advanced Mathematics
|
| 53 |
* Function-calling
|
| 54 |
* SWE-environment (Software Engineering)
|
| 55 |
|
|
|
|
|
|
|
|
|
|
| 56 |
---
|
| 57 |
|
| 58 |
## Usage
|
| 59 |
|
| 60 |
### Transformers
|
| 61 |
-
|
| 62 |
|
| 63 |
```python
|
| 64 |
import torch
|
| 65 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 66 |
|
| 67 |
-
MODEL_ID = "Akicou/Solar-Open-69B-REAP"
|
| 68 |
|
| 69 |
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
|
| 70 |
model = AutoModelForCausalLM.from_pretrained(
|
|
@@ -74,10 +75,11 @@ model = AutoModelForCausalLM.from_pretrained(
|
|
| 74 |
trust_remote_code=True
|
| 75 |
)
|
| 76 |
|
| 77 |
-
#
|
| 78 |
-
messages = [{"role": "user", "content": "Explain
|
| 79 |
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
|
| 80 |
|
|
|
|
| 81 |
outputs = model.generate(inputs, max_new_tokens=1024, temperature=0.7)
|
| 82 |
print(tokenizer.decode(outputs[0]))
|
| 83 |
|
|
@@ -85,5 +87,4 @@ print(tokenizer.decode(outputs[0]))
|
|
| 85 |
|
| 86 |
## License
|
| 87 |
|
| 88 |
-
The model weights are licensed under the **Solar-Apache License 2.0
|
| 89 |
-
|
|
|
|
| 27 |
|
| 28 |
* **Pruning Method:** REAP (Router Expert Activation Pruning) based on the [Cerebras Research REAP implementation](https://github.com/CerebrasResearch/reap).
|
| 29 |
* **Optimization:** Pruned using ~100 samples from the `nickrosh/Evol-Instruct-Code-80k-v1` dataset.
|
| 30 |
+
* **Hardware:** Pruned on 4x NVIDIA A100 SXM.
|
| 31 |
+
* **Custom Chat Template:** Features a specialized template to manage reasoning length and prevent "non-stop" generation issues.
|
| 32 |
|
| 33 |
## Links to Quants
|
| 34 |
- [Solar Open 69B REAP GGUF](https://huggingface.co/Akicou/Solar-Open-69B-REAP-GGUF)
|
| 35 |
|
| 36 |
---
|
| 37 |
|
| 38 |
+
## Technical Details & Updates
|
| 39 |
|
| 40 |
This model was created by modifying a clone of the Cerebras REAP repository. The goal was to reduce the overhead of the 102B MoE architecture while maintaining high performance in core tasks.
|
| 41 |
|
| 42 |
+
### Chat Template Fix (Jan 7, ~19:00)
|
| 43 |
+
As of **January 7th, 2026 (~19:00)**, the `chat_template` has been officially updated and fixed. This update resolves issues where the model would enter infinite reasoning loops or provide excessively long responses. The new template "dumbs down" the reasoning length to ensure more concise and usable outputs.
|
| 44 |
|
| 45 |
+
### Acknowledgments
|
| 46 |
+
Special thanks to **[Barney Greenway](https://huggingface.co/McG-221)** for informing me about the long-reasoning/non-stop yapping issues, which directly led to the template fix.
|
|
|
|
|
|
|
| 47 |
|
| 48 |
+
### Future Roadmap
|
| 49 |
+
Any future REAP uploads to this profile will include specialized experts for:
|
| 50 |
* Advanced Mathematics
|
| 51 |
* Function-calling
|
| 52 |
* SWE-environment (Software Engineering)
|
| 53 |
|
| 54 |
+
> [!NOTE]
|
| 55 |
+
> Due to the pruning and current template constraints, this model is currently **not optimized for complex math**.
|
| 56 |
+
|
| 57 |
---
|
| 58 |
|
| 59 |
## Usage
|
| 60 |
|
| 61 |
### Transformers
|
| 62 |
+
Ensure you have `transformers`, `accelerate`, and `torch` installed.
|
| 63 |
|
| 64 |
```python
|
| 65 |
import torch
|
| 66 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 67 |
|
| 68 |
+
MODEL_ID = "Akicou/Solar-Open-69B-REAP"
|
| 69 |
|
| 70 |
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
|
| 71 |
model = AutoModelForCausalLM.from_pretrained(
|
|
|
|
| 75 |
trust_remote_code=True
|
| 76 |
)
|
| 77 |
|
| 78 |
+
# Prepare input
|
| 79 |
+
messages = [{"role": "user", "content": "Explain the benefit of MoE pruning."}]
|
| 80 |
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
|
| 81 |
|
| 82 |
+
# Generate
|
| 83 |
outputs = model.generate(inputs, max_new_tokens=1024, temperature=0.7)
|
| 84 |
print(tokenizer.decode(outputs[0]))
|
| 85 |
|
|
|
|
| 87 |
|
| 88 |
## License
|
| 89 |
|
| 90 |
+
The model weights are licensed under the **Solar-Apache License 2.0**.
|
|
|