Update README.md
Browse files
README.md
CHANGED
|
@@ -48,6 +48,17 @@ Ask questions in our new Discord Server https://discord.com/invite/t75KbPgwhk or
|
|
| 48 |
|
| 49 |
QwQ-32B-ArliAI-RpR-v4 is the third release in the RpR series. It is a 32-billion parameter model fine-tuned using the RpR dataset based on the curated RPMax dataset combined with techniques to maintain reasoning abilities in long multi-turn chats.
|
| 50 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
### Specs
|
| 52 |
|
| 53 |
* **Base Model**: QwQ-32B
|
|
@@ -67,9 +78,9 @@ QwQ-32B-ArliAI-RpR-v4 is the third release in the RpR series. It is a 32-billion
|
|
| 67 |
|
| 68 |
### Very Nice Training graphs :)
|
| 69 |
|
| 70 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/
|
| 71 |
|
| 72 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/
|
| 73 |
|
| 74 |
### Quantization
|
| 75 |
|
|
@@ -102,7 +113,7 @@ If you see the whole response is in the reasoning block, then your \<think> and
|
|
| 102 |
|
| 103 |
### If you set everything up correctly, it should look like this:
|
| 104 |
|
| 105 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/
|
| 106 |
|
| 107 |
---
|
| 108 |
|
|
|
|
| 48 |
|
| 49 |
QwQ-32B-ArliAI-RpR-v4 is the third release in the RpR series. It is a 32-billion parameter model fine-tuned using the RpR dataset based on the curated RPMax dataset combined with techniques to maintain reasoning abilities in long multi-turn chats.
|
| 50 |
|
| 51 |
+
### Recommended Samplers
|
| 52 |
+
|
| 53 |
+
RpR models does not work well with repetition penalty type of samplers, even more advanced ones such as XTC or DRY. It works best with simple sampler settings and also being allowed to reason for a long time (high max tokens).
|
| 54 |
+
|
| 55 |
+
Recommended to first start with:
|
| 56 |
+
|
| 57 |
+
* **Temperature**: 1.0
|
| 58 |
+
* **MinP**: 0.02
|
| 59 |
+
* **TopP**: 40
|
| 60 |
+
* **Response Tokens**: 2048+
|
| 61 |
+
|
| 62 |
### Specs
|
| 63 |
|
| 64 |
* **Base Model**: QwQ-32B
|
|
|
|
| 78 |
|
| 79 |
### Very Nice Training graphs :)
|
| 80 |
|
| 81 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/J-cD7mjdIG58BsSPpuS6x.png" alt="Train Loss" width="600">
|
| 82 |
|
| 83 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/T890dqrUcBYnlOzK7MXrU.png" alt="Eval Loss" width="600">
|
| 84 |
|
| 85 |
### Quantization
|
| 86 |
|
|
|
|
| 113 |
|
| 114 |
### If you set everything up correctly, it should look like this:
|
| 115 |
|
| 116 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/wFQC8Df9dLaiQGnIg_iEo.png" alt="RpR example response" width="600">
|
| 117 |
|
| 118 |
---
|
| 119 |
|