ArliAI
/

QwQ-32B-ArliAI-RpR-v2

Safetensors

English

qwen2

Model card Files Files and versions

xet

Community

OwenArli commited on Apr 23, 2025

Commit

faa8572

verified ·

1 Parent(s): d195519

Update README.md

Browse files

Files changed (1) hide show

README.md +25 -1

README.md CHANGED Viewed

@@ -10,8 +10,24 @@ base_model:
 <img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/9TIfNBdy29CDnn8NNIQPt.jpeg" alt="clickbait" width="500">
 =====================================
 ## RpR Series Overview: Building on RPMax with Reasoning
 RpR (RolePlay with Reasoning) is a new series of models from ArliAI. This series **builds directly upon the successful dataset curation methodology and training methods developed for the RPMax series**.
@@ -32,7 +48,7 @@ Ask questions in our new Discord Server https://discord.com/invite/t75KbPgwhk or
 ## Model Description
-QwQ-32B-ArliAI-RpR-v2 is the first release in the RpR series. It is a 32-billion parameter model fine-tuned using the curated RPMax dataset combined with techniques to maintain reasoning abilities in long multi-turn chats.
 ### Specs
@@ -50,6 +66,14 @@ QwQ-32B-ArliAI-RpR-v2 is the first release in the RpR series. It is a 32-billion
 *   **Learning Rate**: 0.00001
 *   **Gradient accumulation**: 32
 ### Quantization
 *   **BF16**: https://huggingface.co/ArliAI/QwQ-32B-ArliAI-RpR-v2

 <img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/9TIfNBdy29CDnn8NNIQPt.jpeg" alt="clickbait" width="500">
+(Image generated using Arli AI Image Generation)
 =====================================
+## RpR v2 Changes:
+- Fixed dissasociated thoughts:
+  A lot of effort have been made to completely re-run the RpR dataset generation in order to make sure the generated thinking tokens now always match what the model responses are.
+- Fixed random refusals:
+  The previous RpR v1 dataset was generated with vanilla QwQ which caused some refusals in both the thinking and response examples, with RpR v2 the dataset generation is now done using QwQ-abliterated which prevents any refusals from coming through.
+- Used QwQ-abliterated as base:
+  In an effort to further prevent random refusals and allowing the model to do anything you want it to do, RpR v2 now use an abliterated version of QwQ as the starting base for the LoRA being finetuned.
 ## RpR Series Overview: Building on RPMax with Reasoning
 RpR (RolePlay with Reasoning) is a new series of models from ArliAI. This series **builds directly upon the successful dataset curation methodology and training methods developed for the RPMax series**.
 ## Model Description
+QwQ-32B-ArliAI-RpR-v2 is the second release in the RpR series. It is a 32-billion parameter model fine-tuned using the RpR dataset based on the curated RPMax dataset combined with techniques to maintain reasoning abilities in long multi-turn chats.
 ### Specs
 *   **Learning Rate**: 0.00001
 *   **Gradient accumulation**: 32
+### Very Nice Training graphs :)
+<img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/yfv2Os-TeZ1XtaD10poLS.png" alt="Train Loss" width="600">
+<img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/t6Gu8KANpOM9Fg26_mH6h.png" alt="Eval Loss" width="600">
+<img src="https://cdn-uploads.huggingface.co/production/uploads/6625f4a8a8d1362ebcc3851a/gibV01ZrbtKtLBsm_CnbC.png" alt="Grad Norm" width="600">
 ### Quantization
 *   **BF16**: https://huggingface.co/ArliAI/QwQ-32B-ArliAI-RpR-v2