AdityaNarayan commited on
Commit
b360bbc
·
verified ·
1 Parent(s): f42bee4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +137 -0
README.md ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: zai-org/GLM-4.5-Air
3
+ tags:
4
+ - rust
5
+ - Hyperswitch
6
+ - LoRA
7
+ - CPT
8
+ - Causal-LM
9
+ - code-generation
10
+ - phased-training
11
+ - multiNode-training
12
+ - curriculum-learning
13
+ - FSDP
14
+ pipeline_tag: text-generation
15
+ language:
16
+ - en
17
+ datasets:
18
+ - AdityaNarayan/HS-Repo-Curriculum-Learning
19
+ library_name: peft
20
+ license: apache-2.0
21
+ ---
22
+
23
+ # GLM-4.5-Air-HS-LoRA-CurriculumLearning
24
+
25
+ A LoRA fine-tuned version of [GLM-4.5-Air](https://huggingface.co/zai-org/GLM-4.5-Air) (108B MoE) trained on the [Hyperswitch](https://github.com/juspay/hyperswitch) codebase using **Phased Curriculum Learning**.
26
+
27
+ ## Model Description
28
+
29
+ This model is specifically trained to understand and assist with the Hyperswitch payment orchestration codebase. It was trained using a 3-phase curriculum learning approach on multi-node H200 GPUs with PyTorch FSDP.
30
+
31
+ ### Key Features
32
+
33
+ - 🎯 **Domain-Specific**: Trained exclusively on Hyperswitch Rust codebase
34
+ - 📚 **Curriculum Learning**: 3-phase progressive training (Foundation → Evolution → PR Mastery)
35
+
36
+ ## Training Details
37
+
38
+ ### Hardware Configuration
39
+
40
+ | Component | Specification |
41
+ |-----------|---------------|
42
+ | GPUs | 16× NVIDIA H200 (144GB each) |
43
+ | Nodes | 2 nodes × 8 GPUs |
44
+ | Distributed Strategy | PyTorch FSDP (Full Shard) |
45
+ | Precision | BF16 Mixed Precision |
46
+
47
+ ### LoRA Configuration
48
+
49
+ | Parameter | Value |
50
+ |-----------|-------|
51
+ | LoRA Rank (r) | 128 |
52
+ | LoRA Alpha | 256 |
53
+ | LoRA Dropout | 0.05 |
54
+ | Target Modules | q_proj, k_proj, v_proj, o_proj |
55
+ | Trainable Parameters | 368 tensors |
56
+
57
+ ### Training Hyperparameters
58
+
59
+ | Parameter | Value |
60
+ |-----------|-------|
61
+ | Effective Batch Size | 32 (1 × 2 grad_accum × 16 GPUs) |
62
+ | Sequence Length | 16,384 tokens |
63
+ | Chunk Overlap | 2,048 tokens |
64
+ | LR Scheduler | Cosine |
65
+ | Weight Decay | 0.01 |
66
+ | Max Grad Norm | 1.0 |
67
+ | Precision | BF16 |
68
+
69
+ ## Curriculum Learning Phases
70
+
71
+ The model was trained using a 3-phase curriculum learning approach, where each phase builds upon the previous:
72
+
73
+ ### Phase 1: Foundation (2 epochs)
74
+ | Metric | Value |
75
+ |--------|-------|
76
+ | **Dataset** | Codebase structure and file patterns |
77
+ | **Samples** | 9,293 train / 512 eval |
78
+ | **Learning Rate** | 2.5e-5 |
79
+ | **Warmup Ratio** | 0.15 |
80
+ | **Training Time** | 12.7 hours |
81
+ | **Final Eval Loss** | 0.365 |
82
+ | **Final Eval Accuracy** | 88.8% |
83
+
84
+ ### Phase 2: Evolution (2 epochs)
85
+ | Metric | Value |
86
+ |--------|-------|
87
+ | **Dataset** | Commit patterns and code changes |
88
+ | **Samples** | 16,622 train / 1,545 eval |
89
+ | **Learning Rate** | 2.0e-5 |
90
+ | **Warmup Ratio** | 0.10 |
91
+ | **Training Time** | 24.7 hours |
92
+ | **Final Eval Loss** | 2.55 |
93
+ | **Final Eval Accuracy** | 40.8% |
94
+
95
+ *Note: Higher loss in Phase 2 is expected due to the complexity of diff/commit patterns.*
96
+
97
+ ### Phase 3: PR Mastery (1 epoch)
98
+ | Metric | Value |
99
+ |--------|-------|
100
+ | **Dataset** | Pull request and review patterns |
101
+ | **Samples** | 9,797 train / 509 eval |
102
+ | **Learning Rate** | 1.5e-5 |
103
+ | **Warmup Ratio** | 0.05 |
104
+ | **Training Time** | 6.9 hours |
105
+ | **Final Eval Loss** | 0.501 |
106
+ | **Final Eval Accuracy** | 90.2% |
107
+
108
+ ### Training Summary
109
+
110
+ | Metric | Value |
111
+ |--------|-------|
112
+ | **Total Training Time** | 44.9 hours |
113
+ | **Total Steps** | 1,926 |
114
+ | **Total Epochs** | 5 (2 + 2 + 1) |
115
+ | **Initial Train Loss** | 0.592 |
116
+ | **Final Train Loss** | 0.495 |
117
+ | **Final Perplexity** | 1.65 |
118
+
119
+ ## Citation
120
+
121
+ If you use this model, please cite:
122
+
123
+ ```bibtex
124
+ @misc{glm45air-hs-lora-curriculum,
125
+ title = {GLM-4.5-Air-HS-LoRA-CurriculumLearning},
126
+ author = {Aditya Narayan},
127
+ year = {2025},
128
+ publisher = {Hugging Face},
129
+ url = {https://huggingface.co/AdityaNarayan/GLM-4.5-Air-HS-LoRA-CurriculumLearning}
130
+ }
131
+ ```
132
+
133
+ ## Acknowledgments
134
+
135
+ - Base model: [GLM-4.5-Air](https://huggingface.co/zai-org/GLM-4.5-Air) by Zhipu AI
136
+ - Training framework: PyTorch FSDP + PEFT
137
+ - Dataset: Hyperswitch open-source repository by Juspay