|
|
--- |
|
|
language: |
|
|
- en |
|
|
- ko |
|
|
license: other |
|
|
license_name: solar-apache-2.0 |
|
|
tags: |
|
|
- upstage |
|
|
- solar |
|
|
- moe |
|
|
- llm |
|
|
- pruning |
|
|
- reap |
|
|
base_model: |
|
|
- upstage/Solar-Open-100B |
|
|
--- |
|
|
|
|
|
<p align="center"> |
|
|
<img src="./Solar-Open-69B-REAP.png" alt="Solar Open 69B REAP" width="100%"> |
|
|
</p> |
|
|
|
|
|
> [!NOTE] |
|
|
> Due to the specific pruning focus, this current version is **not optimized for math**. |
|
|
|
|
|
|
|
|
|
|
|
# Solar Open to 69B REAP |
|
|
|
|
|
This repository contains a pruned version of Upstage's flagship Solar-Open-100B. Using **REAP (Router Expert Activation Pruning)**, the model has been compressed from its original size to a highly efficient **69B parameter** model. |
|
|
|
|
|
## Model Highlights |
|
|
|
|
|
* **Pruning Method:** REAP (Router Expert Activation Pruning) via a modified [Cerebras Research](https://github.com/CerebrasResearch/reap) implementation. |
|
|
* **Optimization:** Pruned on **4x NVIDIA A100 SXM** using ~100 samples from the `nickrosh/Evol-Instruct-Code-80k-v1` dataset. |
|
|
* **Chat Template Fix:** Updated on **Jan 7th, 2026 (~19:00)** to resolve infinite reasoning loops and "non-stop yapping" issues. |
|
|
|
|
|
## Recommended Quants |
|
|
|
|
|
For GGUF usage, we recommend these versions from the **mradermacher** team: |
|
|
|
|
|
- [Solar-Open-69B-REAP-i1-GGUF (imatrix)](https://huggingface.co/mradermacher/Solar-Open-69B-REAP-i1-GGUF) |
|
|
- [Solar-Open-69B-REAP-GGUF (Standard)](https://huggingface.co/mradermacher/Solar-Open-69B-REAP-GGUF) |
|
|
|
|
|
--- |
|
|
|
|
|
## Technical Details & Acknowledgments |
|
|
|
|
|
The goal of this REAP was to reduce the overhead of the massive 102B MoE architecture while maintaining elite performance in instruction following and coding. |
|
|
|
|
|
### Chat Template Fix |
|
|
The custom chat template implemented on **Jan 7th** "dumbs down" the reasoning length. This ensures the model stays on task and provides concise responses rather than getting lost in long-winded internal monologues. |
|
|
|
|
|
**Special Thanks:** Shoutout to **[Barney Greenway](https://huggingface.co/McG-221)** for identifying the nonstop reasoning behavior and providing the feedback needed to fix the template. |
|
|
|
|
|
### Future Roadmap |
|
|
Any future REAP uploads to this profile will include specialized experts for: |
|
|
* Advanced Mathematics |
|
|
* Function-calling |
|
|
* SWE-environment (Software Engineering) |
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## Usage with llama-cpp-python |
|
|
|
|
|
To ensure the **Jan 7th template fix** is applied (overriding the older metadata inside the GGUF), we recommend downloading the template directly from this repository's `tokenizer_config.json` using the code below: |
|
|
|
|
|
```python |
|
|
from llama_cpp import Llama |
|
|
from transformers import AutoTokenizer |
|
|
|
|
|
# 1. Pull the latest fixed template from this HF Repo |
|
|
repo_id = "Akicou/Solar-Open-69B-REAP" |
|
|
tokenizer = AutoTokenizer.from_pretrained(repo_id) |
|
|
fixed_template = tokenizer.chat_template |
|
|
|
|
|
# 2. Initialize the Llama model |
|
|
llm = Llama( |
|
|
model_path="./Solar-Open-69B-REAP.Q4_K_M.gguf", |
|
|
n_ctx=4096, |
|
|
n_gpu_layers=-1 # Use -1 to offload all layers to GPU |
|
|
) |
|
|
|
|
|
# 3. Create completion with the injected template |
|
|
messages = [ |
|
|
{"role": "system", "content": "You are a concise and helpful assistant."}, |
|
|
{"role": "user", "content": "Explain the benefits of MoE pruning."} |
|
|
] |
|
|
|
|
|
response = llm.create_chat_completion( |
|
|
messages=messages, |
|
|
chat_template=fixed_template, # Force-uses the latest Jan 7th fix |
|
|
temp=0.7 |
|
|
) |
|
|
|
|
|
print(response["choices"][0]["message"]["content"]) |
|
|
|
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
The model weights are licensed under the **Solar-Apache License 2.0**. |
|
|
|
|
|
|