Akicou
/

Solar-Open-69B-REAP

@@ -27,44 +27,45 @@ This repository contains a pruned version of Upstage's **Solar-Open-100B**. Usin
 * **Pruning Method:** REAP (Router Expert Activation Pruning) based on the [Cerebras Research REAP implementation](https://github.com/CerebrasResearch/reap).
 * **Optimization:** Pruned using ~100 samples from the `nickrosh/Evol-Instruct-Code-80k-v1` dataset.
-* **Hardware used:** 4x NVIDIA A100 SXM.
-* **Custom Chat Template:** Includes a specialized chat template designed to manage reasoning length and prevent "non-stop" yapping.
 ## Links to Quants
 - [Solar Open 69B REAP GGUF](https://huggingface.co/Akicou/Solar-Open-69B-REAP-GGUF)
 ---
-## Technical Details & Pruning
 This model was created by modifying a clone of the Cerebras REAP repository. The goal was to reduce the overhead of the 102B MoE architecture while maintaining high performance in core tasks.
-### Acknowledgments
-Special thanks to **[Barney Greenway](https://huggingface.co/McG-221)** for identifying the "infinite reasoning/non-stop yapping" issue found in earlier iterations.
-### Chat Template & Behavior
-To address the long-winded reasoning issues, I implemented a custom `chat_template` that prioritizes concise outputs.
-> [!IMPORTANT]
-> While this model is more efficient for general instructions and coding, it is currently **not optimized for math**.
-### Future Plans
-Future REAP uploads to this profile will include specialized experts for:
 * Advanced Mathematics
 * Function-calling
 * SWE-environment (Software Engineering)
 ---
 ## Usage
 ### Transformers
-You will need `transformers`, `accelerate`, and `torch`.
 ```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
-MODEL_ID = "Akicou/Solar-Open-69B-REAP" # Replace with your actual repo path
 tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
 model = AutoModelForCausalLM.from_pretrained(
@@ -74,10 +75,11 @@ model = AutoModelForCausalLM.from_pretrained(
     trust_remote_code=True
 )
-# The model uses a custom chat template to keep reasoning concise
-messages = [{"role": "user", "content": "Explain how REAP pruning works."}]
 inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
 outputs = model.generate(inputs, max_new_tokens=1024, temperature=0.7)
 print(tokenizer.decode(outputs[0]))
@@ -85,5 +87,4 @@ print(tokenizer.decode(outputs[0]))
 ## License
-The model weights are licensed under the **Solar-Apache License 2.0**, following the base model requirements from Upstage.

 * **Pruning Method:** REAP (Router Expert Activation Pruning) based on the [Cerebras Research REAP implementation](https://github.com/CerebrasResearch/reap).
 * **Optimization:** Pruned using ~100 samples from the `nickrosh/Evol-Instruct-Code-80k-v1` dataset.
+* **Hardware:** Pruned on 4x NVIDIA A100 SXM.
+* **Custom Chat Template:** Features a specialized template to manage reasoning length and prevent "non-stop" generation issues.
 ## Links to Quants
 - [Solar Open 69B REAP GGUF](https://huggingface.co/Akicou/Solar-Open-69B-REAP-GGUF)
 ---
+## Technical Details & Updates
 This model was created by modifying a clone of the Cerebras REAP repository. The goal was to reduce the overhead of the 102B MoE architecture while maintaining high performance in core tasks.
+### Chat Template Fix (Jan 7, ~19:00)
+As of **January 7th, 2026 (~19:00)**, the `chat_template` has been officially updated and fixed. This update resolves issues where the model would enter infinite reasoning loops or provide excessively long responses. The new template "dumbs down" the reasoning length to ensure more concise and usable outputs.
+### Acknowledgments
+Special thanks to **[Barney Greenway](https://huggingface.co/McG-221)** for informing me about the long-reasoning/non-stop yapping issues, which directly led to the template fix.
+### Future Roadmap
+Any future REAP uploads to this profile will include specialized experts for:
 * Advanced Mathematics
 * Function-calling
 * SWE-environment (Software Engineering)
+> [!NOTE]
+> Due to the pruning and current template constraints, this model is currently **not optimized for complex math**.
 ---
 ## Usage
 ### Transformers
+Ensure you have `transformers`, `accelerate`, and `torch` installed.
 ```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
+MODEL_ID = "Akicou/Solar-Open-69B-REAP"
 tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
 model = AutoModelForCausalLM.from_pretrained(
     trust_remote_code=True
 )
+# Prepare input
+messages = [{"role": "user", "content": "Explain the benefit of MoE pruning."}]
 inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
+# Generate
 outputs = model.generate(inputs, max_new_tokens=1024, temperature=0.7)
 print(tokenizer.decode(outputs[0]))
 ## License
+The model weights are licensed under the **Solar-Apache License 2.0**.