dipta007 commited on
Commit
d8e07b5
·
verified ·
1 Parent(s): 3db14da

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +110 -13
README.md CHANGED
@@ -1,21 +1,118 @@
1
  ---
2
- base_model: dipta007/GanitLLM-1.7B-SFT-1332
3
- tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
- - qwen3
8
  license: apache-2.0
 
 
9
  language:
10
- - en
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
- # Uploaded finetuned model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
- - **Developed by:** dipta007
16
- - **License:** apache-2.0
17
- - **Finetuned from model :** dipta007/GanitLLM-1.7B-SFT-1332
18
 
19
- This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
20
 
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
1
  ---
2
+ library_name: transformers
 
 
 
 
 
3
  license: apache-2.0
4
+ base_model: Qwen/Qwen3-1.7B
5
+ pipeline_tag: text-generation
6
  language:
7
+ - bn
8
+ - en
9
+ tags:
10
+ - math
11
+ - bengali
12
+ - reasoning
13
+ - grpo
14
+ - curriculum-learning
15
+ datasets:
16
+ - dipta007/Ganit
17
  ---
18
 
19
+ # GanitLLM-1.7B_SFT_CGRPO
20
+
21
+ [![Paper](https://img.shields.io/badge/arXiv-Paper-red)](https://arxiv.org/)
22
+ [![Dataset](https://img.shields.io/badge/HuggingFace-Dataset-yellow)](https://huggingface.co/datasets/dipta007/Ganit)
23
+ [![Models](https://img.shields.io/badge/HuggingFace-Models-orange)](https://huggingface.co/collections/dipta007/ganitllm)
24
+
25
+ ## Highlights
26
+
27
+ **GanitLLM-1.7B_SFT_CGRPO** is a compact Bengali mathematical reasoning model trained using the novel **Curriculum-GRPO** approach. Key improvements over the base Qwen3-1.7B model:
28
+
29
+ - **+37.6 accuracy** on Bn-MGSM benchmark (15.2 → 52.8)
30
+ - **+52.7 accuracy** on Bn-MSVAMP benchmark (14.1 → 66.8)
31
+ - **87.80% Bengali reasoning** (vs 19.64% for base model)
32
+ - **81.3% fewer tokens** in generated solutions (1124 → 210 words)
33
+
34
+ ## Model Overview
35
+
36
+ | Property | Value |
37
+ |----------|-------|
38
+ | **Model Type** | Causal Language Model |
39
+ | **Base Model** | Qwen/Qwen3-1.7B |
40
+ | **Parameters** | 1.7B |
41
+ | **Training** | SFT + Curriculum-GRPO |
42
+ | **Context Length** | 4,096 tokens |
43
+ | **Language** | Bengali, English |
44
+
45
+ ## Training Details
46
+
47
+ This model was trained using our multi-stage pipeline:
48
+
49
+ 1. **Supervised Fine-Tuning (SFT)**: Trained on GANIT-SFT (~11k examples) to ground reasoning in Bengali
50
+ 2. **Curriculum-GRPO**: Reinforcement learning with difficulty-aware sampling on GANIT-RLVR (~7.3k examples)
51
+
52
+ ### Reward Functions
53
+ - **Format Reward**: Validates `<think>` and `<answer>` tag structure
54
+ - **Correctness Reward**: +2.0 for Bengali answer match, +1.0 for English match
55
+ - **Bengali Reasoning Reward**: Ensures >80% Bengali text in reasoning
56
+
57
+ ## Quickstart
58
+
59
+ ```python
60
+ from transformers import AutoModelForCausalLM, AutoTokenizer
61
+
62
+ model_name = "dipta007/GanitLLM-1.7B_SFT_CGRPO"
63
+
64
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
65
+ model = AutoModelForCausalLM.from_pretrained(
66
+ model_name,
67
+ torch_dtype="auto",
68
+ device_map="auto"
69
+ )
70
+
71
+ problem = "একটি দোকানে ১২টি আপেল আছে। যদি ৫টি আপেল বিক্রি হয়, তাহলে কতটি আপেল বাকি থাকবে?"
72
+
73
+ prompt = f"""A conversation takes place between the user and the assistant. The user asks a question, and the assistant solves the problem. Please reason step by step in Bengali, and put your final answer in the <answer> </answer> tags.
74
+
75
+ Question: {problem}"""
76
+
77
+ messages = [{"role": "user", "content": prompt}]
78
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
79
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
80
+
81
+ generated_ids = model.generate(**model_inputs, max_new_tokens=2048, temperature=0.7)
82
+ output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
83
+ response = tokenizer.decode(output_ids, skip_special_tokens=True)
84
+ print(response)
85
+ ```
86
+
87
+ ### Using vLLM
88
+
89
+ ```bash
90
+ vllm serve dipta007/GanitLLM-1.7B_SFT_CGRPO --max-model-len 4096
91
+ ```
92
+
93
+ ## Performance
94
+
95
+ | Model | Bn-MGSM | Bn-MSVAMP | Avg. Words | Bengali % |
96
+ |-------|---------|-----------|------------|-----------|
97
+ | Qwen3-1.7B (base) | 15.20 | 14.10 | 1124 | 19.64% |
98
+ | **GanitLLM-1.7B_SFT_CGRPO** | **52.80** | **66.80** | **210** | **87.80%** |
99
+
100
+ ## Related Models
101
+
102
+ | Model | Parameters | Training | Link |
103
+ |-------|------------|----------|------|
104
+ | GanitLLM-4B_SFT_CGRPO | 4B | SFT + CGRPO | [Link](https://huggingface.co/dipta007/GanitLLM-4B_SFT_CGRPO) |
105
+ | **GanitLLM-1.7B_SFT_CGRPO** | 1.7B | SFT + CGRPO | [Link](https://huggingface.co/dipta007/GanitLLM-1.7B_SFT_CGRPO) |
106
+ | GanitLLM-1.7B_SFT_GRPO | 1.7B | SFT + GRPO | [Link](https://huggingface.co/dipta007/GanitLLM-1.7B_SFT_GRPO) |
107
+ | GanitLLM-1.7B_CGRPO | 1.7B | CGRPO | [Link](https://huggingface.co/dipta007/GanitLLM-1.7B_CGRPO) |
108
+ | GanitLLM-0.6B_SFT_CGRPO | 0.6B | SFT + CGRPO | [Link](https://huggingface.co/dipta007/GanitLLM-0.6B_SFT_CGRPO) |
109
+
110
+ ## Citation
111
 
112
+ ```bibtex
113
+ will be updated
114
+ ```
115
 
116
+ ## License
117
 
118
+ This model is released under the Apache 2.0 License.