ArliAI
/

QwQ-32B-ArliAI-RpR-v4

Text Generation

text-generation-inference

Model card Files Files and versions

OwenArli commited on May 22, 2025

Commit

cfd0ec3

·

verified ·

1 Parent(s): 7211f9c

Update README.md

Files changed (1) hide show

README.md +14 -3

README.md CHANGED Viewed

@@ -48,6 +48,17 @@ Ask questions in our new Discord Server https://discord.com/invite/t75KbPgwhk or
 QwQ-32B-ArliAI-RpR-v4 is the third release in the RpR series. It is a 32-billion parameter model fine-tuned using the RpR dataset based on the curated RPMax dataset combined with techniques to maintain reasoning abilities in long multi-turn chats.
 ### Specs
 *   **Base Model**: QwQ-32B
@@ -67,9 +78,9 @@ QwQ-32B-ArliAI-RpR-v4 is the third release in the RpR series. It is a 32-billion
 ### Very Nice Training graphs :)
-<img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/gBGmhMB0kgoJTmxs-fvtk.png" alt="Train Loss" width="600">
-<img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/DtdMtuoA4bX8mKmxOSY10.png" alt="Eval Loss" width="600">
 ### Quantization
@@ -102,7 +113,7 @@ If you see the whole response is in the reasoning block, then your \<think> and
 ### If you set everything up correctly, it should look like this:
-<img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/IDs6FooZgVTIBNHFHZUZB.png" alt="RpR example response" width="600">
 ---

 QwQ-32B-ArliAI-RpR-v4 is the third release in the RpR series. It is a 32-billion parameter model fine-tuned using the RpR dataset based on the curated RPMax dataset combined with techniques to maintain reasoning abilities in long multi-turn chats.
+### Recommended Samplers
+RpR models does not work well with repetition penalty type of samplers, even more advanced ones such as XTC or DRY. It works best with simple sampler settings and also being allowed to reason for a long time (high max tokens).
+Recommended to first start with:
+*   **Temperature**: 1.0
+*   **MinP**: 0.02
+*   **TopP**: 40
+*   **Response Tokens**: 2048+
 ### Specs
 *   **Base Model**: QwQ-32B
 ### Very Nice Training graphs :)
+<img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/J-cD7mjdIG58BsSPpuS6x.png" alt="Train Loss" width="600">
+<img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/T890dqrUcBYnlOzK7MXrU.png" alt="Eval Loss" width="600">
 ### Quantization
 ### If you set everything up correctly, it should look like this:
+<img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/wFQC8Df9dLaiQGnIg_iEo.png" alt="RpR example response" width="600">
 ---