---
language:
- en
- ko
license: other
license_name: solar-apache-2.0
tags:
- upstage
- solar
- moe
- llm
- pruning
- reap
base_model:
- upstage/Solar-Open-100B
---

<p align="center">
  <img src="./Solar-Open-69B-REAP.png" alt="Solar Open 69B REAP" width="100%">
</p>

> [!NOTE]
> Due to the specific pruning focus, this current version is **not optimized for math**.


# Solar Open to 69B REAP

This repository contains a pruned version of Upstage's flagship Solar-Open-100B. Using **REAP (Router Expert Activation Pruning)**, the model has been compressed from its original size to a highly efficient **69B parameter** model.

## Model Highlights

* **Pruning Method:** REAP (Router Expert Activation Pruning) via a modified [Cerebras Research](https://github.com/CerebrasResearch/reap) implementation.
* **Optimization:** Pruned on **4x NVIDIA A100 SXM** using ~100 samples from the `nickrosh/Evol-Instruct-Code-80k-v1` dataset.
* **Chat Template Fix:** Updated on **Jan 7th, 2026 (~19:00)** to resolve infinite reasoning loops and "non-stop yapping" issues.

## Recommended Quants

For GGUF usage, we recommend these versions from the **mradermacher** team:

- [Solar-Open-69B-REAP-i1-GGUF (imatrix)](https://huggingface.co/mradermacher/Solar-Open-69B-REAP-i1-GGUF)
- [Solar-Open-69B-REAP-GGUF (Standard)](https://huggingface.co/mradermacher/Solar-Open-69B-REAP-GGUF)

---

## Technical Details & Acknowledgments

The goal of this REAP was to reduce the overhead of the massive 102B MoE architecture while maintaining elite performance in instruction following and coding.

### Chat Template Fix
The custom chat template implemented on **Jan 7th** "dumbs down" the reasoning length. This ensures the model stays on task and provides concise responses rather than getting lost in long-winded internal monologues.

**Special Thanks:** Shoutout to **[Barney Greenway](https://huggingface.co/McG-221)** for identifying the nonstop reasoning behavior and providing the feedback needed to fix the template.

### Future Roadmap
Any future REAP uploads to this profile will include specialized experts for:
* Advanced Mathematics
* Function-calling
* SWE-environment (Software Engineering)


---

## Usage with llama-cpp-python

To ensure the **Jan 7th template fix** is applied (overriding the older metadata inside the GGUF), we recommend downloading the template directly from this repository's `tokenizer_config.json` using the code below:

```python
from llama_cpp import Llama
from transformers import AutoTokenizer

# 1. Pull the latest fixed template from this HF Repo
repo_id = "Akicou/Solar-Open-69B-REAP" 
tokenizer = AutoTokenizer.from_pretrained(repo_id)
fixed_template = tokenizer.chat_template

# 2. Initialize the Llama model
llm = Llama(
    model_path="./Solar-Open-69B-REAP.Q4_K_M.gguf",
    n_ctx=4096,
    n_gpu_layers=-1 # Use -1 to offload all layers to GPU
)

# 3. Create completion with the injected template
messages = [
    {"role": "system", "content": "You are a concise and helpful assistant."},
    {"role": "user", "content": "Explain the benefits of MoE pruning."}
]

response = llm.create_chat_completion(
    messages=messages,
    chat_template=fixed_template, # Force-uses the latest Jan 7th fix
    temp=0.7
)

print(response["choices"][0]["message"]["content"])

```

## License

The model weights are licensed under the **Solar-Apache License 2.0**.