Phu Nguyen
commited on
Update README.md
Browse files
README.md
CHANGED
|
@@ -7,7 +7,8 @@
|
|
| 7 |
## Training Methodology
|
| 8 |
|
| 9 |
- **Framework**: [ii_thought](https://github.com/Intelligent-Internet/ii-thought) / [verl](https://github.com/volcengine/verl)
|
| 10 |
-
- **Algorithm**: GRPO
|
|
|
|
| 11 |
- **Reward Modeling**
|
| 12 |
- **Answer correctness reward**
|
| 13 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/67c563afa34e1ad5a3533ccf/X15GjihIRO9hkfL361Pfd.png" width="300">
|
|
|
|
| 7 |
## Training Methodology
|
| 8 |
|
| 9 |
- **Framework**: [ii_thought](https://github.com/Intelligent-Internet/ii-thought) / [verl](https://github.com/volcengine/verl)
|
| 10 |
+
- **Algorithm**: GRPO
|
| 11 |
+
- **Base Model**: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
|
| 12 |
- **Reward Modeling**
|
| 13 |
- **Answer correctness reward**
|
| 14 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/67c563afa34e1ad5a3533ccf/X15GjihIRO9hkfL361Pfd.png" width="300">
|