dipta007 commited on
Commit
fb2cd0d
·
verified ·
1 Parent(s): 827cffc

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +136 -0
README.md ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: gemma
4
+ license_link: https://ai.google.dev/gemma/terms
5
+ pipeline_tag: text-generation
6
+ tags:
7
+ - math
8
+ - reasoning
9
+ - computational-graph
10
+ - bangla
11
+ - low-resource
12
+ - distractor-aware
13
+ - small-model
14
+ base_model:
15
+ - google/gemma-3-4b-it
16
+ language:
17
+ - bn
18
+ - en
19
+ datasets:
20
+ - dipta007/dagger
21
+ - dipta007/DistractMath-Bn
22
+ ---
23
+
24
+ # DAGGER-4B-SFT-GRPO
25
+
26
+ <a href="https://arxiv.org/abs/XXXX.XXXXX" target="_blank">
27
+ <img alt="arXiv" src="https://img.shields.io/badge/arXiv-XXXX.XXXXX-b31b1b" style="display: inline-block; vertical-align: middle;"/>
28
+ </a>
29
+ <a href="https://github.com/dipta007/dagger" target="_blank">
30
+ <img alt="GitHub" src="https://img.shields.io/badge/GitHub-Code-black" style="display: inline-block; vertical-align: middle;"/>
31
+ </a>
32
+
33
+ ## Model Description
34
+
35
+ **DAGGER-4B-SFT-GRPO** is the smaller variant of DAGGER, trained with SFT followed by GRPO on Gemma-3-4B. While showing lower performance than the 12B variant, it demonstrates that the DAGGER framework can work with smaller models.
36
+
37
+ ## Highlights
38
+
39
+ - **Lightweight**: 4B parameters for resource-constrained deployment
40
+ - **SFT → GRPO training**: Full training pipeline
41
+ - **Improved over baselines**: Still outperforms CoT on distractor robustness
42
+ - **Capacity study**: Demonstrates model size requirements for graph generation
43
+
44
+ ## Model Overview
45
+
46
+ | Attribute | Value |
47
+ |-----------|-------|
48
+ | Base Model | Gemma-3-4B-Instruct |
49
+ | Training | SFT → GRPO |
50
+ | Parameters | 4B |
51
+ | LoRA Rank | 64 |
52
+
53
+ ## Performance
54
+
55
+ | Dataset | Original | +Distractor | Drop |
56
+ |---------|----------|-------------|------|
57
+ | MGSM | 54.8 | 31.4 | 23.4 |
58
+ | MSVAMP | 70.3 | 42.9 | 27.4 |
59
+ | **Weighted Avg** | - | - | **47.3** |
60
+
61
+ ### Comparison with 12B Variant
62
+
63
+ | Model | Params | Weighted Avg |
64
+ |-------|--------|--------------|
65
+ | dagger-4B_SFT_GRPO | 4B | 47.3 |
66
+ | dagger-12B_SFT_GRPO | 12B | **69.4** (+22.1) |
67
+
68
+ **Key Finding**: The 12B model provides +22 points improvement, suggesting a capacity threshold for effective computational graph generation.
69
+
70
+ ## Quickstart
71
+
72
+ ```python
73
+ from transformers import AutoModelForCausalLM, AutoTokenizer
74
+
75
+ model_name = "dipta007/dagger-4B_SFT_GRPO"
76
+
77
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
78
+ model = AutoModelForCausalLM.from_pretrained(
79
+ model_name,
80
+ torch_dtype="auto",
81
+ device_map="auto"
82
+ )
83
+
84
+ question = "মিনার কাছে ১০০টি কলম আছে। প্রতিটি কলমের দাম ৫ টাকা।"
85
+
86
+ messages = [
87
+ {"role": "system", "content": "You are an expert Bangla Math Reasoner. Solve by constructing a Computational Graph."},
88
+ {"role": "user", "content": question}
89
+ ]
90
+
91
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
92
+ inputs = tokenizer([text], return_tensors="pt").to(model.device)
93
+
94
+ outputs = model.generate(**inputs, max_new_tokens=1024)
95
+ response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
96
+ print(response)
97
+ ```
98
+
99
+ ## Training Configuration
100
+
101
+ Same as 12B variant:
102
+
103
+ | Parameter | Value |
104
+ |-----------|-------|
105
+ | LoRA Rank / Alpha | 64 / 128 |
106
+ | SFT Batch Size | 256 |
107
+ | GRPO Batch Size | 32 |
108
+ | Generations per Prompt | 8 |
109
+ | Epochs | 4 |
110
+
111
+ ## When to Use This Model
112
+
113
+ - **Resource-constrained deployment**: When 12B is too large
114
+ - **Capacity studies**: Research on model size vs. performance
115
+ - **Edge deployment**: Smaller memory footprint
116
+ - **Prototyping**: Faster iteration during development
117
+
118
+ ## Limitations
119
+
120
+ - **Lower accuracy**: 22 points below 12B variant
121
+ - **Reduced robustness**: Larger accuracy drop under distractors
122
+ - **Capacity constraints**: May struggle with complex multi-step problems
123
+
124
+ ## Related Models
125
+
126
+ | Model | Size | Weighted Avg |
127
+ |-------|------|--------------|
128
+ | **dagger-4B_SFT_GRPO** | 4B | 47.3 |
129
+ | [dagger-4B_SFT](https://huggingface.co/dipta007/dagger-4B_SFT) | 4B | 44.3 |
130
+ | [dagger-12B_SFT_GRPO](https://huggingface.co/dipta007/dagger-12B_SFT_GRPO) | 12B | **69.4** |
131
+
132
+ ## Citation
133
+
134
+ ```bibtex
135
+ will be updated
136
+ ```