Solar-Open-69B-REAP / README.md
Akicou's picture
Update README.md
d1dd15a verified
---
language:
- en
- ko
license: other
license_name: solar-apache-2.0
tags:
- upstage
- solar
- moe
- llm
- pruning
- reap
base_model:
- upstage/Solar-Open-100B
---
<p align="center">
<img src="./Solar-Open-69B-REAP.png" alt="Solar Open 69B REAP" width="100%">
</p>
> [!NOTE]
> Due to the specific pruning focus, this current version is **not optimized for math**.
# Solar Open to 69B REAP
This repository contains a pruned version of Upstage's flagship Solar-Open-100B. Using **REAP (Router Expert Activation Pruning)**, the model has been compressed from its original size to a highly efficient **69B parameter** model.
## Model Highlights
* **Pruning Method:** REAP (Router Expert Activation Pruning) via a modified [Cerebras Research](https://github.com/CerebrasResearch/reap) implementation.
* **Optimization:** Pruned on **4x NVIDIA A100 SXM** using ~100 samples from the `nickrosh/Evol-Instruct-Code-80k-v1` dataset.
* **Chat Template Fix:** Updated on **Jan 7th, 2026 (~19:00)** to resolve infinite reasoning loops and "non-stop yapping" issues.
## Recommended Quants
For GGUF usage, we recommend these versions from the **mradermacher** team:
- [Solar-Open-69B-REAP-i1-GGUF (imatrix)](https://huggingface.co/mradermacher/Solar-Open-69B-REAP-i1-GGUF)
- [Solar-Open-69B-REAP-GGUF (Standard)](https://huggingface.co/mradermacher/Solar-Open-69B-REAP-GGUF)
---
## Technical Details & Acknowledgments
The goal of this REAP was to reduce the overhead of the massive 102B MoE architecture while maintaining elite performance in instruction following and coding.
### Chat Template Fix
The custom chat template implemented on **Jan 7th** "dumbs down" the reasoning length. This ensures the model stays on task and provides concise responses rather than getting lost in long-winded internal monologues.
**Special Thanks:** Shoutout to **[Barney Greenway](https://huggingface.co/McG-221)** for identifying the nonstop reasoning behavior and providing the feedback needed to fix the template.
### Future Roadmap
Any future REAP uploads to this profile will include specialized experts for:
* Advanced Mathematics
* Function-calling
* SWE-environment (Software Engineering)
---
## Usage with llama-cpp-python
To ensure the **Jan 7th template fix** is applied (overriding the older metadata inside the GGUF), we recommend downloading the template directly from this repository's `tokenizer_config.json` using the code below:
```python
from llama_cpp import Llama
from transformers import AutoTokenizer
# 1. Pull the latest fixed template from this HF Repo
repo_id = "Akicou/Solar-Open-69B-REAP"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
fixed_template = tokenizer.chat_template
# 2. Initialize the Llama model
llm = Llama(
model_path="./Solar-Open-69B-REAP.Q4_K_M.gguf",
n_ctx=4096,
n_gpu_layers=-1 # Use -1 to offload all layers to GPU
)
# 3. Create completion with the injected template
messages = [
{"role": "system", "content": "You are a concise and helpful assistant."},
{"role": "user", "content": "Explain the benefits of MoE pruning."}
]
response = llm.create_chat_completion(
messages=messages,
chat_template=fixed_template, # Force-uses the latest Jan 7th fix
temp=0.7
)
print(response["choices"][0]["message"]["content"])
```
## License
The model weights are licensed under the **Solar-Apache License 2.0**.