Update README.md
Browse files
README.md
CHANGED
|
@@ -25,10 +25,10 @@ datasets:
|
|
| 25 |
|
| 26 |
**GanitLLM-4B_SFT** is a Bengali mathematical reasoning model trained with Supervised Fine-Tuning on the GANIT dataset. This model serves as the foundation for further RL training (GRPO/CGRPO). Key improvements over the base Qwen3-4B model:
|
| 27 |
|
| 28 |
-
- **+4.
|
| 29 |
-
- **+4.
|
| 30 |
- **86.65% Bengali reasoning** (vs 14.79% for base model)
|
| 31 |
-
- **80.5% fewer
|
| 32 |
|
| 33 |
> **Note**: This is the SFT-only checkpoint. For best results, use the RL-enhanced versions: [GanitLLM-4B_SFT_CGRPO](https://huggingface.co/dipta007/GanitLLM-4B_SFT_CGRPO) or [GanitLLM-4B_SFT_GRPO](https://huggingface.co/dipta007/GanitLLM-4B_SFT_GRPO).
|
| 34 |
|
|
|
|
| 25 |
|
| 26 |
**GanitLLM-4B_SFT** is a Bengali mathematical reasoning model trained with Supervised Fine-Tuning on the GANIT dataset. This model serves as the foundation for further RL training (GRPO/CGRPO). Key improvements over the base Qwen3-4B model:
|
| 27 |
|
| 28 |
+
- **+4.80 accuracy** on Bn-MGSM benchmark (69.20 → 74.00)
|
| 29 |
+
- **+4.10 accuracy** on Bn-MSVAMP benchmark (70.50 → 74.60)
|
| 30 |
- **86.65% Bengali reasoning** (vs 14.79% for base model)
|
| 31 |
+
- **80.5% fewer words** in generated solutions (943 → 184 words)
|
| 32 |
|
| 33 |
> **Note**: This is the SFT-only checkpoint. For best results, use the RL-enhanced versions: [GanitLLM-4B_SFT_CGRPO](https://huggingface.co/dipta007/GanitLLM-4B_SFT_CGRPO) or [GanitLLM-4B_SFT_GRPO](https://huggingface.co/dipta007/GanitLLM-4B_SFT_GRPO).
|
| 34 |
|