--- language: - en - ko license: other license_name: solar-apache-2.0 tags: - upstage - solar - moe - llm - pruning - reap base_model: - upstage/Solar-Open-100B ---

Solar Open 69B REAP

> [!NOTE] > Due to the specific pruning focus, this current version is **not optimized for math**. # Solar Open to 69B REAP This repository contains a pruned version of Upstage's flagship Solar-Open-100B. Using **REAP (Router Expert Activation Pruning)**, the model has been compressed from its original size to a highly efficient **69B parameter** model. ## Model Highlights * **Pruning Method:** REAP (Router Expert Activation Pruning) via a modified [Cerebras Research](https://github.com/CerebrasResearch/reap) implementation. * **Optimization:** Pruned on **4x NVIDIA A100 SXM** using ~100 samples from the `nickrosh/Evol-Instruct-Code-80k-v1` dataset. * **Chat Template Fix:** Updated on **Jan 7th, 2026 (~19:00)** to resolve infinite reasoning loops and "non-stop yapping" issues. ## Recommended Quants For GGUF usage, we recommend these versions from the **mradermacher** team: - [Solar-Open-69B-REAP-i1-GGUF (imatrix)](https://huggingface.co/mradermacher/Solar-Open-69B-REAP-i1-GGUF) - [Solar-Open-69B-REAP-GGUF (Standard)](https://huggingface.co/mradermacher/Solar-Open-69B-REAP-GGUF) --- ## Technical Details & Acknowledgments The goal of this REAP was to reduce the overhead of the massive 102B MoE architecture while maintaining elite performance in instruction following and coding. ### Chat Template Fix The custom chat template implemented on **Jan 7th** "dumbs down" the reasoning length. This ensures the model stays on task and provides concise responses rather than getting lost in long-winded internal monologues. **Special Thanks:** Shoutout to **[Barney Greenway](https://huggingface.co/McG-221)** for identifying the nonstop reasoning behavior and providing the feedback needed to fix the template. ### Future Roadmap Any future REAP uploads to this profile will include specialized experts for: * Advanced Mathematics * Function-calling * SWE-environment (Software Engineering) --- ## Usage with llama-cpp-python To ensure the **Jan 7th template fix** is applied (overriding the older metadata inside the GGUF), we recommend downloading the template directly from this repository's `tokenizer_config.json` using the code below: ```python from llama_cpp import Llama from transformers import AutoTokenizer # 1. Pull the latest fixed template from this HF Repo repo_id = "Akicou/Solar-Open-69B-REAP" tokenizer = AutoTokenizer.from_pretrained(repo_id) fixed_template = tokenizer.chat_template # 2. Initialize the Llama model llm = Llama( model_path="./Solar-Open-69B-REAP.Q4_K_M.gguf", n_ctx=4096, n_gpu_layers=-1 # Use -1 to offload all layers to GPU ) # 3. Create completion with the injected template messages = [ {"role": "system", "content": "You are a concise and helpful assistant."}, {"role": "user", "content": "Explain the benefits of MoE pruning."} ] response = llm.create_chat_completion( messages=messages, chat_template=fixed_template, # Force-uses the latest Jan 7th fix temp=0.7 ) print(response["choices"][0]["message"]["content"]) ``` ## License The model weights are licensed under the **Solar-Apache License 2.0**.