OwenArli commited on
Commit
faa8572
·
verified ·
1 Parent(s): d195519

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -1
README.md CHANGED
@@ -10,8 +10,24 @@ base_model:
10
 
11
  <img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/9TIfNBdy29CDnn8NNIQPt.jpeg" alt="clickbait" width="500">
12
 
 
 
13
  =====================================
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ## RpR Series Overview: Building on RPMax with Reasoning
16
 
17
  RpR (RolePlay with Reasoning) is a new series of models from ArliAI. This series **builds directly upon the successful dataset curation methodology and training methods developed for the RPMax series**.
@@ -32,7 +48,7 @@ Ask questions in our new Discord Server https://discord.com/invite/t75KbPgwhk or
32
 
33
  ## Model Description
34
 
35
- QwQ-32B-ArliAI-RpR-v2 is the first release in the RpR series. It is a 32-billion parameter model fine-tuned using the curated RPMax dataset combined with techniques to maintain reasoning abilities in long multi-turn chats.
36
 
37
  ### Specs
38
 
@@ -50,6 +66,14 @@ QwQ-32B-ArliAI-RpR-v2 is the first release in the RpR series. It is a 32-billion
50
  * **Learning Rate**: 0.00001
51
  * **Gradient accumulation**: 32
52
 
 
 
 
 
 
 
 
 
53
  ### Quantization
54
 
55
  * **BF16**: https://huggingface.co/ArliAI/QwQ-32B-ArliAI-RpR-v2
 
10
 
11
  <img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/9TIfNBdy29CDnn8NNIQPt.jpeg" alt="clickbait" width="500">
12
 
13
+ (Image generated using Arli AI Image Generation)
14
+
15
  =====================================
16
 
17
+ ## RpR v2 Changes:
18
+
19
+ - Fixed dissasociated thoughts:
20
+
21
+ A lot of effort have been made to completely re-run the RpR dataset generation in order to make sure the generated thinking tokens now always match what the model responses are.
22
+
23
+ - Fixed random refusals:
24
+
25
+ The previous RpR v1 dataset was generated with vanilla QwQ which caused some refusals in both the thinking and response examples, with RpR v2 the dataset generation is now done using QwQ-abliterated which prevents any refusals from coming through.
26
+
27
+ - Used QwQ-abliterated as base:
28
+
29
+ In an effort to further prevent random refusals and allowing the model to do anything you want it to do, RpR v2 now use an abliterated version of QwQ as the starting base for the LoRA being finetuned.
30
+
31
  ## RpR Series Overview: Building on RPMax with Reasoning
32
 
33
  RpR (RolePlay with Reasoning) is a new series of models from ArliAI. This series **builds directly upon the successful dataset curation methodology and training methods developed for the RPMax series**.
 
48
 
49
  ## Model Description
50
 
51
+ QwQ-32B-ArliAI-RpR-v2 is the second release in the RpR series. It is a 32-billion parameter model fine-tuned using the RpR dataset based on the curated RPMax dataset combined with techniques to maintain reasoning abilities in long multi-turn chats.
52
 
53
  ### Specs
54
 
 
66
  * **Learning Rate**: 0.00001
67
  * **Gradient accumulation**: 32
68
 
69
+ ### Very Nice Training graphs :)
70
+
71
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/yfv2Os-TeZ1XtaD10poLS.png" alt="Train Loss" width="600">
72
+
73
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/t6Gu8KANpOM9Fg26_mH6h.png" alt="Eval Loss" width="600">
74
+
75
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/gibV01ZrbtKtLBsm_CnbC.png" alt="Grad Norm" width="600">
76
+
77
  ### Quantization
78
 
79
  * **BF16**: https://huggingface.co/ArliAI/QwQ-32B-ArliAI-RpR-v2