|
|
--- |
|
|
library_name: transformers |
|
|
license: apache-2.0 |
|
|
base_model: Qwen/Qwen3-4B |
|
|
pipeline_tag: text-generation |
|
|
language: |
|
|
- bn |
|
|
- en |
|
|
tags: |
|
|
- math |
|
|
- bengali |
|
|
- reasoning |
|
|
- sft |
|
|
datasets: |
|
|
- dipta007/Ganit |
|
|
--- |
|
|
|
|
|
# GanitLLM-4B_SFT |
|
|
|
|
|
[](https://arxiv.org/) |
|
|
[](https://huggingface.co/datasets/dipta007/Ganit) |
|
|
[](https://huggingface.co/collections/dipta007/ganitllm) |
|
|
|
|
|
## Highlights |
|
|
|
|
|
**GanitLLM-4B_SFT** is a Bengali mathematical reasoning model trained with Supervised Fine-Tuning on the GANIT dataset. This model serves as the foundation for further RL training (GRPO/CGRPO). Key improvements over the base Qwen3-4B model: |
|
|
|
|
|
- **+4.80 accuracy** on Bn-MGSM benchmark (69.20 → 74.00) |
|
|
- **+4.10 accuracy** on Bn-MSVAMP benchmark (70.50 → 74.60) |
|
|
- **86.65% Bengali reasoning** (vs 14.79% for base model) |
|
|
- **80.5% fewer words** in generated solutions (943 → 184 words) |
|
|
|
|
|
> **Note**: This is the SFT-only checkpoint. For best results, use the RL-enhanced versions: [GanitLLM-4B_SFT_CGRPO](https://huggingface.co/dipta007/GanitLLM-4B_SFT_CGRPO) or [GanitLLM-4B_SFT_GRPO](https://huggingface.co/dipta007/GanitLLM-4B_SFT_GRPO). |
|
|
|
|
|
## Model Overview |
|
|
|
|
|
| Property | Value | |
|
|
|----------|-------| |
|
|
| **Model Type** | Causal Language Model | |
|
|
| **Base Model** | Qwen/Qwen3-4B | |
|
|
| **Parameters** | 4B | |
|
|
| **Training** | Supervised Fine-Tuning | |
|
|
| **Context Length** | 4,096 tokens | |
|
|
| **Language** | Bengali, English | |
|
|
|
|
|
## Training Details |
|
|
|
|
|
This model was trained with a single-stage pipeline: |
|
|
|
|
|
1. **Supervised Fine-Tuning (SFT)**: Trained on GANIT-SFT (~11k examples) to ground reasoning in Bengali |
|
|
|
|
|
### Training Data |
|
|
- **Dataset**: GANIT-SFT (11,023 examples) |
|
|
- **Format**: Bengali math problems with chain-of-thought reasoning |
|
|
- **Structure**: `<think>` tags for reasoning, `<answer>` tags for final answer |
|
|
|
|
|
## Quickstart |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model_name = "dipta007/GanitLLM-4B_SFT" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_name, |
|
|
torch_dtype="auto", |
|
|
device_map="auto" |
|
|
) |
|
|
|
|
|
problem = "একটি দোকানে ১২টি আপেল আছে। যদি ৫টি আপেল বিক্রি হয়, তাহলে কতটি আপেল বাকি থাকবে?" |
|
|
|
|
|
prompt = f"""A conversation takes place between the user and the assistant. The user asks a question, and the assistant solves the problem. Please reason step by step in Bengali, and put your final answer in the <answer> </answer> tags. |
|
|
|
|
|
Question: {problem}""" |
|
|
|
|
|
messages = [{"role": "user", "content": prompt}] |
|
|
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
|
model_inputs = tokenizer([text], return_tensors="pt").to(model.device) |
|
|
|
|
|
generated_ids = model.generate(**model_inputs, max_new_tokens=2048, temperature=0.7) |
|
|
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() |
|
|
response = tokenizer.decode(output_ids, skip_special_tokens=True) |
|
|
print(response) |
|
|
``` |
|
|
|
|
|
### Using vLLM |
|
|
|
|
|
```bash |
|
|
vllm serve dipta007/GanitLLM-4B_SFT --max-model-len 4096 |
|
|
``` |
|
|
|
|
|
## Performance |
|
|
|
|
|
| Model | Bn-MGSM | Bn-MSVAMP | Avg. Words | Bengali % | |
|
|
|-------|---------|-----------|------------|-----------| |
|
|
| Qwen3-4B (base) | 69.20 | 70.50 | 943 | 14.79% | |
|
|
| **GanitLLM-4B_SFT** | **74.00** | **74.60** | **184** | **86.65%** | |
|
|
|
|
|
## Related Models |
|
|
|
|
|
| Model | Parameters | Training | Link | |
|
|
|-------|------------|----------|------| |
|
|
| GanitLLM-4B_SFT_CGRPO | 4B | SFT + CGRPO | [Link](https://huggingface.co/dipta007/GanitLLM-4B_SFT_CGRPO) | |
|
|
| GanitLLM-4B_SFT_GRPO | 4B | SFT + GRPO | [Link](https://huggingface.co/dipta007/GanitLLM-4B_SFT_GRPO) | |
|
|
| **GanitLLM-4B_SFT** | 4B | SFT | [Link](https://huggingface.co/dipta007/GanitLLM-4B_SFT) | |
|
|
| GanitLLM-4B_CGRPO | 4B | CGRPO | [Link](https://huggingface.co/dipta007/GanitLLM-4B_CGRPO) | |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
will be updated |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under the Apache 2.0 License. |
|
|
|