Neural-Hacker commited on
Commit
9cc842e
·
verified ·
1 Parent(s): 1352d51

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +134 -3
README.md CHANGED
@@ -1,3 +1,134 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ datasets:
4
+ - openai/gsm8k
5
+ language:
6
+ - en
7
+ base_model:
8
+ - Qwen/Qwen2.5-Math-1.5B
9
+ pipeline_tag: text-generation
10
+ library_name: transformers
11
+ tags:
12
+ - math
13
+ - qwen
14
+ - lora
15
+ - mathematics
16
+ - gsm8k
17
+ ---
18
+
19
+ # OpenMath
20
+ Fine-tuning a Small Language Model (SLM) for Step-by-Step Math Reasoning
21
+
22
+ ## Overview
23
+ OpenMath is an open-source project focused on fine-tuning a small language model for math reasoning using QLoRA (4-bit LoRA).
24
+
25
+ This repository contains only a LoRA adapter trained on GSM8K. Users must load the base model separately and attach the adapter.
26
+
27
+ The latest version of this model was trained on an AMD MI300X GPU using ROCm, showing that modern non-NVIDIA accelerators can successfully support large-scale fine-tuning with Hugging Face and PyTorch.
28
+
29
+ ---
30
+
31
+ ## Base Model
32
+ Qwen/Qwen2.5-Math-1.5B
33
+
34
+ This repository does not contain the base model weights — they must be loaded from Hugging Face.
35
+
36
+ ---
37
+
38
+ ## Hardware Used (Latest Training Run)
39
+
40
+ GPU: AMD MI300X (ROCm 7.0)
41
+ VRAM: 192 GB
42
+ Operating System: Ubuntu 24.04
43
+ Framework: PyTorch + Hugging Face
44
+ Backend: ROCm
45
+
46
+ ---
47
+
48
+ ## Dataset
49
+
50
+ GSM8K (Grade School Math 8K)
51
+ Training samples: 1,000
52
+ Evaluation: Full GSM8K test split (1,319 problems)
53
+
54
+ Only the solution portion of each example was used for loss computation through loss masking.
55
+
56
+ ---
57
+
58
+ ## Training Configuration
59
+
60
+ Method: QLoRA (4-bit)
61
+ Quantization: NF4 with float16 compute
62
+ LoRA rank: 16
63
+ LoRA alpha: 32
64
+ LoRA dropout: 0.05
65
+ Target modules: q_proj, k_proj, v_proj, o_proj
66
+ Max sequence length: 1024
67
+ Batch size: 1
68
+ Gradient accumulation: 16
69
+ Effective batch size: 16
70
+ Learning rate: 1e-4
71
+ Optimizer: paged_adamw_8bit
72
+ Scheduler: cosine
73
+ Warmup: 5 percent
74
+ Epochs: 6
75
+
76
+ ---
77
+
78
+ ## Results
79
+
80
+ GSM8K Accuracy (Full Test Set):
81
+ 750 out of 1319 correct, which equals 56.86 percent accuracy.
82
+
83
+ This is significantly stronger than the earlier Colab T4 run and is a strong result for a 1.5B model trained with LoRA.
84
+
85
+ ---
86
+
87
+ ## What This Repository Contains
88
+
89
+ adapter_model.safetensors — LoRA weights
90
+ adapter_config.json — LoRA configuration
91
+ chat_template.jinja — chat formatting template
92
+ tokenizer.json — tokenizer file
93
+ tokenizer_config.json — tokenizer settings
94
+ README.md — documentation
95
+
96
+ This repository does not include checkpoints, optimizer states, or full base model weights.
97
+
98
+ ---
99
+
100
+ ## How to Use This Model
101
+
102
+ Load the base model Qwen/Qwen2.5-Math-1.5B from Hugging Face, then attach this LoRA adapter using PEFT. Generate answers using a prompt that includes an instruction, problem, and solution section.
103
+
104
+ ---
105
+
106
+ ## Why This Matters
107
+
108
+ This project demonstrates that AMD MI300X can train modern language models with Hugging Face and QLoRA.
109
+ It shows that high-quality math reasoning is possible at 1.5B parameters using efficient fine-tuning.
110
+ It provides a lightweight adapter instead of requiring users to download a massive full model.
111
+
112
+ ---
113
+
114
+ ## Limitations
115
+
116
+ The model can make reasoning mistakes.
117
+ It should not be used for exams, assignments, or professional decisions.
118
+ Performance depends heavily on prompt formatting.
119
+
120
+ ---
121
+
122
+ ## Future Work
123
+
124
+ Train on 3,000 to 5,000 GSM8K samples.
125
+ Add SVAMP and ASDiv datasets.
126
+ Improve decoding to reduce repetition.
127
+ Experiment with multi-GPU scaling on MI300X.
128
+ Add a Streamlit demo for interactive use.
129
+
130
+ ---
131
+
132
+ ## License
133
+
134
+ cc-by-nc-4.0