GanitLLM-4B-SFT / README.md

Update README.md

09d43a3 verified 12 days ago

4.16 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: Qwen/Qwen3-4B
	pipeline_tag: text-generation
	language:
	- bn
	- en
	tags:
	- math
	- bengali
	- reasoning
	- sft
	datasets:
	- dipta007/Ganit
	---

	# GanitLLM-4B_SFT

	[![Paper](https://img.shields.io/badge/arXiv-Paper-red)](https://arxiv.org/)
	[![Dataset](https://img.shields.io/badge/HuggingFace-Dataset-yellow)](https://huggingface.co/datasets/dipta007/Ganit)
	[![Models](https://img.shields.io/badge/HuggingFace-Models-orange)](https://huggingface.co/collections/dipta007/ganitllm)

	## Highlights

	GanitLLM-4B_SFT is a Bengali mathematical reasoning model trained with Supervised Fine-Tuning on the GANIT dataset. This model serves as the foundation for further RL training (GRPO/CGRPO). Key improvements over the base Qwen3-4B model:

	- +4.80 accuracy on Bn-MGSM benchmark (69.20 → 74.00)
	- +4.10 accuracy on Bn-MSVAMP benchmark (70.50 → 74.60)
	- 86.65% Bengali reasoning (vs 14.79% for base model)
	- 80.5% fewer words in generated solutions (943 → 184 words)

	> Note: This is the SFT-only checkpoint. For best results, use the RL-enhanced versions: [GanitLLM-4B_SFT_CGRPO](https://huggingface.co/dipta007/GanitLLM-4B_SFT_CGRPO) or [GanitLLM-4B_SFT_GRPO](https://huggingface.co/dipta007/GanitLLM-4B_SFT_GRPO).

	## Model Overview

	\| Property \| Value \|
	\|----------\|-------\|
	\| Model Type \| Causal Language Model \|
	\| Base Model \| Qwen/Qwen3-4B \|
	\| Parameters \| 4B \|
	\| Training \| Supervised Fine-Tuning \|
	\| Context Length \| 4,096 tokens \|
	\| Language \| Bengali, English \|

	## Training Details

	This model was trained with a single-stage pipeline:

	1. Supervised Fine-Tuning (SFT): Trained on GANIT-SFT (~11k examples) to ground reasoning in Bengali

	### Training Data
	- Dataset: GANIT-SFT (11,023 examples)
	- Format: Bengali math problems with chain-of-thought reasoning
	- Structure: `<think>` tags for reasoning, `<answer>` tags for final answer

	## Quickstart

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "dipta007/GanitLLM-4B_SFT"

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)

	problem = "একটি দোকানে ১২টি আপেল আছে। যদি ৫টি আপেল বিক্রি হয়, তাহলে কতটি আপেল বাকি থাকবে?"

	prompt = f"""A conversation takes place between the user and the assistant. The user asks a question, and the assistant solves the problem. Please reason step by step in Bengali, and put your final answer in the <answer> </answer> tags.

	Question: {problem}"""

	messages = [{"role": "user", "content": prompt}]
	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(**model_inputs, max_new_tokens=2048, temperature=0.7)
	output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
	response = tokenizer.decode(output_ids, skip_special_tokens=True)
	print(response)
	```

	### Using vLLM

	```bash
	vllm serve dipta007/GanitLLM-4B_SFT --max-model-len 4096
	```

	## Performance

	\| Model \| Bn-MGSM \| Bn-MSVAMP \| Avg. Words \| Bengali % \|
	\|-------\|---------\|-----------\|------------\|-----------\|
	\| Qwen3-4B (base) \| 69.20 \| 70.50 \| 943 \| 14.79% \|
	\| GanitLLM-4B_SFT \| 74.00 \| 74.60 \| 184 \| 86.65% \|

	## Related Models

	\| Model \| Parameters \| Training \| Link \|
	\|-------\|------------\|----------\|------\|
	\| GanitLLM-4B_SFT_CGRPO \| 4B \| SFT + CGRPO \| [Link](https://huggingface.co/dipta007/GanitLLM-4B_SFT_CGRPO) \|
	\| GanitLLM-4B_SFT_GRPO \| 4B \| SFT + GRPO \| [Link](https://huggingface.co/dipta007/GanitLLM-4B_SFT_GRPO) \|
	\| GanitLLM-4B_SFT \| 4B \| SFT \| [Link](https://huggingface.co/dipta007/GanitLLM-4B_SFT) \|
	\| GanitLLM-4B_CGRPO \| 4B \| CGRPO \| [Link](https://huggingface.co/dipta007/GanitLLM-4B_CGRPO) \|

	## Citation

	```bibtex
	will be updated
	```

	## License

	This model is released under the Apache 2.0 License.