OwenArli commited on
Commit
cfd0ec3
·
verified ·
1 Parent(s): 7211f9c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -3
README.md CHANGED
@@ -48,6 +48,17 @@ Ask questions in our new Discord Server https://discord.com/invite/t75KbPgwhk or
48
 
49
  QwQ-32B-ArliAI-RpR-v4 is the third release in the RpR series. It is a 32-billion parameter model fine-tuned using the RpR dataset based on the curated RPMax dataset combined with techniques to maintain reasoning abilities in long multi-turn chats.
50
 
 
 
 
 
 
 
 
 
 
 
 
51
  ### Specs
52
 
53
  * **Base Model**: QwQ-32B
@@ -67,9 +78,9 @@ QwQ-32B-ArliAI-RpR-v4 is the third release in the RpR series. It is a 32-billion
67
 
68
  ### Very Nice Training graphs :)
69
 
70
- <img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/gBGmhMB0kgoJTmxs-fvtk.png" alt="Train Loss" width="600">
71
 
72
- <img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/DtdMtuoA4bX8mKmxOSY10.png" alt="Eval Loss" width="600">
73
 
74
  ### Quantization
75
 
@@ -102,7 +113,7 @@ If you see the whole response is in the reasoning block, then your \<think> and
102
 
103
  ### If you set everything up correctly, it should look like this:
104
 
105
- <img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/IDs6FooZgVTIBNHFHZUZB.png" alt="RpR example response" width="600">
106
 
107
  ---
108
 
 
48
 
49
  QwQ-32B-ArliAI-RpR-v4 is the third release in the RpR series. It is a 32-billion parameter model fine-tuned using the RpR dataset based on the curated RPMax dataset combined with techniques to maintain reasoning abilities in long multi-turn chats.
50
 
51
+ ### Recommended Samplers
52
+
53
+ RpR models does not work well with repetition penalty type of samplers, even more advanced ones such as XTC or DRY. It works best with simple sampler settings and also being allowed to reason for a long time (high max tokens).
54
+
55
+ Recommended to first start with:
56
+
57
+ * **Temperature**: 1.0
58
+ * **MinP**: 0.02
59
+ * **TopP**: 40
60
+ * **Response Tokens**: 2048+
61
+
62
  ### Specs
63
 
64
  * **Base Model**: QwQ-32B
 
78
 
79
  ### Very Nice Training graphs :)
80
 
81
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/J-cD7mjdIG58BsSPpuS6x.png" alt="Train Loss" width="600">
82
 
83
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/T890dqrUcBYnlOzK7MXrU.png" alt="Eval Loss" width="600">
84
 
85
  ### Quantization
86
 
 
113
 
114
  ### If you set everything up correctly, it should look like this:
115
 
116
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/wFQC8Df9dLaiQGnIg_iEo.png" alt="RpR example response" width="600">
117
 
118
  ---
119