File size: 3,389 Bytes

3c49fda
 
 
 
 
 
 
 
 
 
 
b975fa7
 
5af8b08
f782363
3c49fda
 
 
b975fa7
3c49fda
 
d1dd15a
 
 
 
 
b975fa7
3c49fda
f86981a
3c49fda
b975fa7
3c49fda
f86981a
 
 
3c49fda
f86981a
 
 
 
 
 
3c49fda
b975fa7
3c49fda
f86981a
3c49fda
f86981a
3c49fda
f86981a
 
3c49fda
f86981a
3c49fda
b8b1f34
 
b975fa7
 
 
3c49fda
b8b1f34
b975fa7
3c49fda
f86981a
3c49fda
f86981a
3c49fda
 
f86981a
 
 
 
f6ddf53
f86981a
 
 
 
 
 
 
 
 
3c49fda
f86981a
 
 
 
 
3c49fda
f86981a
 
 
 
3c49fda
 
f86981a
3c49fda
 
 
b975fa7
3c49fda
b8b1f34
f86981a

---
language:
- en
- ko
license: other
license_name: solar-apache-2.0
tags:
- upstage
- solar
- moe
- llm
- pruning
- reap
base_model:
- upstage/Solar-Open-100B
---

<p align="center">
  <img src="./Solar-Open-69B-REAP.png" alt="Solar Open 69B REAP" width="100%">
</p>

> [!NOTE]
> Due to the specific pruning focus, this current version is **not optimized for math**.



# Solar Open to 69B REAP

This repository contains a pruned version of Upstage's flagship Solar-Open-100B. Using **REAP (Router Expert Activation Pruning)**, the model has been compressed from its original size to a highly efficient **69B parameter** model.

## Model Highlights

* **Pruning Method:** REAP (Router Expert Activation Pruning) via a modified [Cerebras Research](https://github.com/CerebrasResearch/reap) implementation.
* **Optimization:** Pruned on **4x NVIDIA A100 SXM** using ~100 samples from the `nickrosh/Evol-Instruct-Code-80k-v1` dataset.
* **Chat Template Fix:** Updated on **Jan 7th, 2026 (~19:00)** to resolve infinite reasoning loops and "non-stop yapping" issues.

## Recommended Quants

For GGUF usage, we recommend these versions from the **mradermacher** team:

- [Solar-Open-69B-REAP-i1-GGUF (imatrix)](https://huggingface.co/mradermacher/Solar-Open-69B-REAP-i1-GGUF)
- [Solar-Open-69B-REAP-GGUF (Standard)](https://huggingface.co/mradermacher/Solar-Open-69B-REAP-GGUF)

---

## Technical Details & Acknowledgments

The goal of this REAP was to reduce the overhead of the massive 102B MoE architecture while maintaining elite performance in instruction following and coding.

### Chat Template Fix
The custom chat template implemented on **Jan 7th** "dumbs down" the reasoning length. This ensures the model stays on task and provides concise responses rather than getting lost in long-winded internal monologues.

**Special Thanks:** Shoutout to **[Barney Greenway](https://huggingface.co/McG-221)** for identifying the nonstop reasoning behavior and providing the feedback needed to fix the template.

### Future Roadmap
Any future REAP uploads to this profile will include specialized experts for:
* Advanced Mathematics
* Function-calling
* SWE-environment (Software Engineering)


---

## Usage with llama-cpp-python

To ensure the **Jan 7th template fix** is applied (overriding the older metadata inside the GGUF), we recommend downloading the template directly from this repository's `tokenizer_config.json` using the code below:

```python
from llama_cpp import Llama
from transformers import AutoTokenizer

# 1. Pull the latest fixed template from this HF Repo
repo_id = "Akicou/Solar-Open-69B-REAP" 
tokenizer = AutoTokenizer.from_pretrained(repo_id)
fixed_template = tokenizer.chat_template

# 2. Initialize the Llama model
llm = Llama(
    model_path="./Solar-Open-69B-REAP.Q4_K_M.gguf",
    n_ctx=4096,
    n_gpu_layers=-1 # Use -1 to offload all layers to GPU
)

# 3. Create completion with the injected template
messages = [
    {"role": "system", "content": "You are a concise and helpful assistant."},
    {"role": "user", "content": "Explain the benefits of MoE pruning."}
]

response = llm.create_chat_completion(
    messages=messages,
    chat_template=fixed_template, # Force-uses the latest Jan 7th fix
    temp=0.7
)

print(response["choices"][0]["message"]["content"])

```

## License

The model weights are licensed under the **Solar-Apache License 2.0**.