Update README.md
Browse files
README.md
CHANGED
|
@@ -78,7 +78,7 @@ This model has been optimized using [Unsloth](https://github.com/unslothai/unslo
|
|
| 78 |
ALLaM-Thinking was trained using a combination of techniques:
|
| 79 |
|
| 80 |
- Base architecture fine-tuned on diverse Arabic datasets
|
| 81 |
-
- GRPO (
|
| 82 |
- Specialized training on mathematical reasoning and step-by-step problem-solving
|
| 83 |
|
| 84 |
## Performance
|
|
|
|
| 78 |
ALLaM-Thinking was trained using a combination of techniques:
|
| 79 |
|
| 80 |
- Base architecture fine-tuned on diverse Arabic datasets
|
| 81 |
+
- GRPO (Group Relative Policy Optimization) for better alignment
|
| 82 |
- Specialized training on mathematical reasoning and step-by-step problem-solving
|
| 83 |
|
| 84 |
## Performance
|