File size: 3,389 Bytes
3c49fda b975fa7 5af8b08 f782363 3c49fda b975fa7 3c49fda d1dd15a b975fa7 3c49fda f86981a 3c49fda b975fa7 3c49fda f86981a 3c49fda f86981a 3c49fda b975fa7 3c49fda f86981a 3c49fda f86981a 3c49fda f86981a 3c49fda f86981a 3c49fda b8b1f34 b975fa7 3c49fda b8b1f34 b975fa7 3c49fda f86981a 3c49fda f86981a 3c49fda f86981a f6ddf53 f86981a 3c49fda f86981a 3c49fda f86981a 3c49fda f86981a 3c49fda b975fa7 3c49fda b8b1f34 f86981a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 | ---
language:
- en
- ko
license: other
license_name: solar-apache-2.0
tags:
- upstage
- solar
- moe
- llm
- pruning
- reap
base_model:
- upstage/Solar-Open-100B
---
<p align="center">
<img src="./Solar-Open-69B-REAP.png" alt="Solar Open 69B REAP" width="100%">
</p>
> [!NOTE]
> Due to the specific pruning focus, this current version is **not optimized for math**.
# Solar Open to 69B REAP
This repository contains a pruned version of Upstage's flagship Solar-Open-100B. Using **REAP (Router Expert Activation Pruning)**, the model has been compressed from its original size to a highly efficient **69B parameter** model.
## Model Highlights
* **Pruning Method:** REAP (Router Expert Activation Pruning) via a modified [Cerebras Research](https://github.com/CerebrasResearch/reap) implementation.
* **Optimization:** Pruned on **4x NVIDIA A100 SXM** using ~100 samples from the `nickrosh/Evol-Instruct-Code-80k-v1` dataset.
* **Chat Template Fix:** Updated on **Jan 7th, 2026 (~19:00)** to resolve infinite reasoning loops and "non-stop yapping" issues.
## Recommended Quants
For GGUF usage, we recommend these versions from the **mradermacher** team:
- [Solar-Open-69B-REAP-i1-GGUF (imatrix)](https://huggingface.co/mradermacher/Solar-Open-69B-REAP-i1-GGUF)
- [Solar-Open-69B-REAP-GGUF (Standard)](https://huggingface.co/mradermacher/Solar-Open-69B-REAP-GGUF)
---
## Technical Details & Acknowledgments
The goal of this REAP was to reduce the overhead of the massive 102B MoE architecture while maintaining elite performance in instruction following and coding.
### Chat Template Fix
The custom chat template implemented on **Jan 7th** "dumbs down" the reasoning length. This ensures the model stays on task and provides concise responses rather than getting lost in long-winded internal monologues.
**Special Thanks:** Shoutout to **[Barney Greenway](https://huggingface.co/McG-221)** for identifying the nonstop reasoning behavior and providing the feedback needed to fix the template.
### Future Roadmap
Any future REAP uploads to this profile will include specialized experts for:
* Advanced Mathematics
* Function-calling
* SWE-environment (Software Engineering)
---
## Usage with llama-cpp-python
To ensure the **Jan 7th template fix** is applied (overriding the older metadata inside the GGUF), we recommend downloading the template directly from this repository's `tokenizer_config.json` using the code below:
```python
from llama_cpp import Llama
from transformers import AutoTokenizer
# 1. Pull the latest fixed template from this HF Repo
repo_id = "Akicou/Solar-Open-69B-REAP"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
fixed_template = tokenizer.chat_template
# 2. Initialize the Llama model
llm = Llama(
model_path="./Solar-Open-69B-REAP.Q4_K_M.gguf",
n_ctx=4096,
n_gpu_layers=-1 # Use -1 to offload all layers to GPU
)
# 3. Create completion with the injected template
messages = [
{"role": "system", "content": "You are a concise and helpful assistant."},
{"role": "user", "content": "Explain the benefits of MoE pruning."}
]
response = llm.create_chat_completion(
messages=messages,
chat_template=fixed_template, # Force-uses the latest Jan 7th fix
temp=0.7
)
print(response["choices"][0]["message"]["content"])
```
## License
The model weights are licensed under the **Solar-Apache License 2.0**.
|