dipta007 commited on
Commit
78368a8
·
verified ·
1 Parent(s): 27f4552

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +110 -12
README.md CHANGED
@@ -1,21 +1,119 @@
1
  ---
2
- base_model: unsloth/gemma-3-4b-it
 
 
 
3
  tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
- - gemma3
8
- license: apache-2.0
 
 
 
 
 
9
  language:
 
10
  - en
 
 
 
11
  ---
12
 
13
- # Uploaded finetuned model
14
 
15
- - **Developed by:** dipta007
16
- - **License:** apache-2.0
17
- - **Finetuned from model :** unsloth/gemma-3-4b-it
 
 
 
18
 
19
- This gemma3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
20
 
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: transformers
3
+ license: gemma
4
+ license_link: https://ai.google.dev/gemma/terms
5
+ pipeline_tag: text-generation
6
  tags:
7
+ - math
8
+ - reasoning
9
+ - computational-graph
10
+ - bangla
11
+ - low-resource
12
+ - distractor-aware
13
+ - sft
14
+ - small-model
15
+ base_model:
16
+ - google/gemma-3-4b-it
17
  language:
18
+ - bn
19
  - en
20
+ datasets:
21
+ - dipta007/dagger
22
+ - dipta007/DistractMath-Bn
23
  ---
24
 
25
+ # DAGGER-4B-SFT
26
 
27
+ <a href="https://arxiv.org/abs/XXXX.XXXXX" target="_blank">
28
+ <img alt="arXiv" src="https://img.shields.io/badge/arXiv-XXXX.XXXXX-b31b1b" style="display: inline-block; vertical-align: middle;"/>
29
+ </a>
30
+ <a href="https://github.com/dipta007/dagger" target="_blank">
31
+ <img alt="GitHub" src="https://img.shields.io/badge/GitHub-Code-black" style="display: inline-block; vertical-align: middle;"/>
32
+ </a>
33
 
34
+ ## Model Description
35
 
36
+ **DAGGER-4B-SFT** is a supervised fine-tuned 4B model for computational graph generation. This model serves as initialization for GRPO training and as a lightweight baseline.
37
+
38
+ ## Model Overview
39
+
40
+ | Attribute | Value |
41
+ |-----------|-------|
42
+ | Base Model | Gemma-3-4B-Instruct |
43
+ | Training | Supervised Fine-Tuning |
44
+ | Parameters | 4B |
45
+ | LoRA Rank | 64 |
46
+
47
+ ## Performance
48
+
49
+ | Dataset | Original | +Distractor | Drop |
50
+ |---------|----------|-------------|------|
51
+ | MGSM | 40.4 | 25.1 | 15.3 |
52
+ | MSVAMP | 65.0 | 42.4 | 22.7 |
53
+ | **Weighted Avg** | - | - | **44.3** |
54
+
55
+ ### Improvement from GRPO
56
+
57
+ | Model | Weighted Avg |
58
+ |-------|--------------|
59
+ | dagger-4B_SFT | 44.3 |
60
+ | dagger-4B_SFT_GRPO | **47.3** (+3.0) |
61
+
62
+ ## Quickstart
63
+
64
+ ```python
65
+ from transformers import AutoModelForCausalLM, AutoTokenizer
66
+
67
+ model_name = "dipta007/dagger-4B_SFT"
68
+
69
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
70
+ model = AutoModelForCausalLM.from_pretrained(
71
+ model_name,
72
+ torch_dtype="auto",
73
+ device_map="auto"
74
+ )
75
+
76
+ question = "মিনার কাছে ১০০টি কলম আছে। প্রতিটি কলমের দাম ৫ টাকা।"
77
+
78
+ messages = [
79
+ {"role": "system", "content": "You are an expert Bangla Math Reasoner. Solve by constructing a Computational Graph."},
80
+ {"role": "user", "content": question}
81
+ ]
82
+
83
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
84
+ inputs = tokenizer([text], return_tensors="pt").to(model.device)
85
+
86
+ outputs = model.generate(**inputs, max_new_tokens=1024)
87
+ response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
88
+ print(response)
89
+ ```
90
+
91
+ ## Training Configuration
92
+
93
+ | Parameter | Value |
94
+ |-----------|-------|
95
+ | LoRA Rank / Alpha | 64 / 128 |
96
+ | Global Batch Size | 256 |
97
+ | Epochs | 4 |
98
+ | Learning Rate | 1e-5 → 1e-6 |
99
+ | Precision | BF16 |
100
+
101
+ ## When to Use This Model
102
+
103
+ - **GRPO initialization**: Starting point for policy optimization
104
+ - **Lightweight baseline**: When 12B models are too large
105
+ - **Ablation studies**: Comparing SFT vs. GRPO contributions
106
+
107
+ ## Related Models
108
+
109
+ | Model | Training | Weighted Avg |
110
+ |-------|----------|--------------|
111
+ | **dagger-4B_SFT** | SFT | 44.3 |
112
+ | [dagger-4B_SFT_GRPO](https://huggingface.co/dipta007/dagger-4B_SFT_GRPO) | SFT → GRPO | 47.3 |
113
+ | [dagger-4B_GRPO](https://huggingface.co/dipta007/dagger-4B_GRPO) | Base → GRPO | 29.3 |
114
+
115
+ ## Citation
116
+
117
+ ```bibtex
118
+ will be updated
119
+ ```