dipta007 commited on
Commit
7daae0b
·
verified ·
1 Parent(s): a3beed9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +122 -12
README.md CHANGED
@@ -1,21 +1,131 @@
1
  ---
2
- base_model: unsloth/gemma-3-12b-it
 
 
 
3
  tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
- - gemma3
8
- license: apache-2.0
 
 
 
 
9
  language:
 
10
  - en
 
 
 
11
  ---
12
 
13
- # Uploaded finetuned model
14
 
15
- - **Developed by:** dipta007
16
- - **License:** apache-2.0
17
- - **Finetuned from model :** unsloth/gemma-3-12b-it
 
 
 
18
 
19
- This gemma3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
20
 
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: transformers
3
+ license: gemma
4
+ license_link: https://ai.google.dev/gemma/terms
5
+ pipeline_tag: text-generation
6
  tags:
7
+ - math
8
+ - reasoning
9
+ - computational-graph
10
+ - bangla
11
+ - low-resource
12
+ - distractor-aware
13
+ - sft
14
+ base_model:
15
+ - google/gemma-3-12b-it
16
  language:
17
+ - bn
18
  - en
19
+ datasets:
20
+ - dipta007/dagger
21
+ - dipta007/DistractMath-Bn
22
  ---
23
 
24
+ # DAGGER-12B-SFT
25
 
26
+ <a href="https://arxiv.org/abs/XXXX.XXXXX" target="_blank">
27
+ <img alt="arXiv" src="https://img.shields.io/badge/arXiv-XXXX.XXXXX-b31b1b" style="display: inline-block; vertical-align: middle;"/>
28
+ </a>
29
+ <a href="https://github.com/your-username/dagger" target="_blank">
30
+ <img alt="GitHub" src="https://img.shields.io/badge/GitHub-Code-black" style="display: inline-block; vertical-align: middle;"/>
31
+ </a>
32
 
33
+ ## Model Description
34
 
35
+ **DAGGER-12B-SFT** is a supervised fine-tuned model for computational graph generation in Bangla mathematical reasoning. This is the SFT-only variant, serving as both a standalone model and initialization for GRPO training.
36
+
37
+ ## Highlights
38
+
39
+ - **SFT-only training** on 3,000 verified computational graph examples
40
+ - **Strong baseline performance** for distractor-aware reasoning
41
+ - **Foundation for GRPO**: Used as initialization for [dagger-12B_SFT_GRPO](https://huggingface.co/dipta007/dagger-12B_SFT_GRPO)
42
+ - **Efficient inference**: ~400 tokens per problem
43
+
44
+ ## Model Overview
45
+
46
+ | Attribute | Value |
47
+ |-----------|-------|
48
+ | Base Model | Gemma-3-12B-Instruct |
49
+ | Training | Supervised Fine-Tuning |
50
+ | Parameters | 12B |
51
+ | LoRA Rank | 64 |
52
+ | Max Sequence Length | 4096 |
53
+
54
+ ## Performance
55
+
56
+ | Dataset | Original | +Distractor | Drop |
57
+ |---------|----------|-------------|------|
58
+ | MGSM | 70.0 | 56.8 | 13.2 |
59
+ | MSVAMP | 76.8 | 65.4 | 11.5 |
60
+ | **Weighted Avg** | - | - | **66.7** |
61
+
62
+ ### Comparison with GRPO
63
+
64
+ | Model | Weighted Avg Accuracy |
65
+ |-------|----------------------|
66
+ | dagger-12B_SFT | 66.7 |
67
+ | dagger-12B_SFT_GRPO | **69.4** (+2.7) |
68
+
69
+ GRPO provides +2.7 points improvement over SFT alone.
70
+
71
+ ## Quickstart
72
+
73
+ ```python
74
+ from transformers import AutoModelForCausalLM, AutoTokenizer
75
+
76
+ model_name = "dipta007/dagger-12B_SFT"
77
+
78
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
79
+ model = AutoModelForCausalLM.from_pretrained(
80
+ model_name,
81
+ torch_dtype="auto",
82
+ device_map="auto"
83
+ )
84
+
85
+ question = "মিনার কাছে ১০০টি কলম আছে। প্রতিটি কলমের দাম ৫ টাকা। মিনা সব কলম বিক্রি করলে কত টাকা পাবে?"
86
+
87
+ messages = [
88
+ {"role": "system", "content": "You are an expert Bangla Math Reasoner. Solve by constructing a Computational Graph."},
89
+ {"role": "user", "content": question}
90
+ ]
91
+
92
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
93
+ inputs = tokenizer([text], return_tensors="pt").to(model.device)
94
+
95
+ outputs = model.generate(**inputs, max_new_tokens=1024)
96
+ response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
97
+ print(response)
98
+ ```
99
+
100
+ ## Training Configuration
101
+
102
+ | Parameter | Value |
103
+ |-----------|-------|
104
+ | LoRA Rank / Alpha | 64 / 128 |
105
+ | Global Batch Size | 256 |
106
+ | Epochs | 4 |
107
+ | Learning Rate | 1e-5 → 1e-6 |
108
+ | Optimizer | AdamW |
109
+ | Weight Decay | 0.001 |
110
+ | Precision | BF16 |
111
+
112
+ ## When to Use This Model
113
+
114
+ - **As a baseline**: Compare against GRPO-enhanced variants
115
+ - **For GRPO initialization**: Use as starting point for policy optimization
116
+ - **Resource-constrained settings**: When GRPO training is not feasible
117
+ - **Research**: Studying the effect of SFT vs. GRPO on graph generation
118
+
119
+ ## Related Models
120
+
121
+ | Model | Training | Performance |
122
+ |-------|----------|-------------|
123
+ | **dagger-12B_SFT** | SFT | 66.7 |
124
+ | [dagger-12B_SFT_GRPO](https://huggingface.co/dipta007/dagger-12B_SFT_GRPO) | SFT → GRPO | **69.4** |
125
+ | [dagger-12B_GRPO](https://huggingface.co/dipta007/dagger-12B_GRPO) | Base → GRPO | 69.4 |
126
+
127
+ ## Citation
128
+
129
+ ```bibtex
130
+ will be updated
131
+ ```