Akicou
/

Solar-Open-69B-REAP

Mixture of Experts

Model card Files Files and versions

Solar-Open-69B-REAP / README.md

Akicou's picture

Update README.md

d1dd15a verified about 1 month ago

|

history blame contribute delete

3.39 kB

	---
	language:
	- en
	- ko
	license: other
	license_name: solar-apache-2.0
	tags:
	- upstage
	- solar
	- moe
	- llm
	- pruning
	- reap
	base_model:
	- upstage/Solar-Open-100B
	---

	<p align="center">
	<img src="./Solar-Open-69B-REAP.png" alt="Solar Open 69B REAP" width="100%">
	</p>

	> [!NOTE]
	> Due to the specific pruning focus, this current version is not optimized for math.



	# Solar Open to 69B REAP

	This repository contains a pruned version of Upstage's flagship Solar-Open-100B. Using REAP (Router Expert Activation Pruning), the model has been compressed from its original size to a highly efficient 69B parameter model.

	## Model Highlights

	* Pruning Method: REAP (Router Expert Activation Pruning) via a modified [Cerebras Research](https://github.com/CerebrasResearch/reap) implementation.
	* Optimization: Pruned on 4x NVIDIA A100 SXM using ~100 samples from the `nickrosh/Evol-Instruct-Code-80k-v1` dataset.
	* Chat Template Fix: Updated on Jan 7th, 2026 (~19:00) to resolve infinite reasoning loops and "non-stop yapping" issues.

	## Recommended Quants

	For GGUF usage, we recommend these versions from the mradermacher team:

	- [Solar-Open-69B-REAP-i1-GGUF (imatrix)](https://huggingface.co/mradermacher/Solar-Open-69B-REAP-i1-GGUF)
	- [Solar-Open-69B-REAP-GGUF (Standard)](https://huggingface.co/mradermacher/Solar-Open-69B-REAP-GGUF)

	---

	## Technical Details & Acknowledgments

	The goal of this REAP was to reduce the overhead of the massive 102B MoE architecture while maintaining elite performance in instruction following and coding.

	### Chat Template Fix
	The custom chat template implemented on Jan 7th "dumbs down" the reasoning length. This ensures the model stays on task and provides concise responses rather than getting lost in long-winded internal monologues.

	Special Thanks: Shoutout to [Barney Greenway](https://huggingface.co/McG-221) for identifying the nonstop reasoning behavior and providing the feedback needed to fix the template.

	### Future Roadmap
	Any future REAP uploads to this profile will include specialized experts for:
	* Advanced Mathematics
	* Function-calling
	* SWE-environment (Software Engineering)


	---

	## Usage with llama-cpp-python

	To ensure the Jan 7th template fix is applied (overriding the older metadata inside the GGUF), we recommend downloading the template directly from this repository's `tokenizer_config.json` using the code below:

	```python
	from llama_cpp import Llama
	from transformers import AutoTokenizer

	# 1. Pull the latest fixed template from this HF Repo
	repo_id = "Akicou/Solar-Open-69B-REAP"
	tokenizer = AutoTokenizer.from_pretrained(repo_id)
	fixed_template = tokenizer.chat_template

	# 2. Initialize the Llama model
	llm = Llama(
	model_path="./Solar-Open-69B-REAP.Q4_K_M.gguf",
	n_ctx=4096,
	n_gpu_layers=-1 # Use -1 to offload all layers to GPU
	)

	# 3. Create completion with the injected template
	messages = [
	{"role": "system", "content": "You are a concise and helpful assistant."},
	{"role": "user", "content": "Explain the benefits of MoE pruning."}
	]

	response = llm.create_chat_completion(
	messages=messages,
	chat_template=fixed_template, # Force-uses the latest Jan 7th fix
	temp=0.7
	)

	print(response["choices"][0]["message"]["content"])

	```

	## License

	The model weights are licensed under the Solar-Apache License 2.0.