Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,137 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
base_model: zai-org/GLM-4.5-Air
|
| 3 |
+
tags:
|
| 4 |
+
- rust
|
| 5 |
+
- Hyperswitch
|
| 6 |
+
- LoRA
|
| 7 |
+
- CPT
|
| 8 |
+
- Causal-LM
|
| 9 |
+
- code-generation
|
| 10 |
+
- phased-training
|
| 11 |
+
- multiNode-training
|
| 12 |
+
- curriculum-learning
|
| 13 |
+
- FSDP
|
| 14 |
+
pipeline_tag: text-generation
|
| 15 |
+
language:
|
| 16 |
+
- en
|
| 17 |
+
datasets:
|
| 18 |
+
- AdityaNarayan/HS-Repo-Curriculum-Learning
|
| 19 |
+
library_name: peft
|
| 20 |
+
license: apache-2.0
|
| 21 |
+
---
|
| 22 |
+
|
| 23 |
+
# GLM-4.5-Air-HS-LoRA-CurriculumLearning
|
| 24 |
+
|
| 25 |
+
A LoRA fine-tuned version of [GLM-4.5-Air](https://huggingface.co/zai-org/GLM-4.5-Air) (108B MoE) trained on the [Hyperswitch](https://github.com/juspay/hyperswitch) codebase using **Phased Curriculum Learning**.
|
| 26 |
+
|
| 27 |
+
## Model Description
|
| 28 |
+
|
| 29 |
+
This model is specifically trained to understand and assist with the Hyperswitch payment orchestration codebase. It was trained using a 3-phase curriculum learning approach on multi-node H200 GPUs with PyTorch FSDP.
|
| 30 |
+
|
| 31 |
+
### Key Features
|
| 32 |
+
|
| 33 |
+
- 🎯 **Domain-Specific**: Trained exclusively on Hyperswitch Rust codebase
|
| 34 |
+
- 📚 **Curriculum Learning**: 3-phase progressive training (Foundation → Evolution → PR Mastery)
|
| 35 |
+
|
| 36 |
+
## Training Details
|
| 37 |
+
|
| 38 |
+
### Hardware Configuration
|
| 39 |
+
|
| 40 |
+
| Component | Specification |
|
| 41 |
+
|-----------|---------------|
|
| 42 |
+
| GPUs | 16× NVIDIA H200 (144GB each) |
|
| 43 |
+
| Nodes | 2 nodes × 8 GPUs |
|
| 44 |
+
| Distributed Strategy | PyTorch FSDP (Full Shard) |
|
| 45 |
+
| Precision | BF16 Mixed Precision |
|
| 46 |
+
|
| 47 |
+
### LoRA Configuration
|
| 48 |
+
|
| 49 |
+
| Parameter | Value |
|
| 50 |
+
|-----------|-------|
|
| 51 |
+
| LoRA Rank (r) | 128 |
|
| 52 |
+
| LoRA Alpha | 256 |
|
| 53 |
+
| LoRA Dropout | 0.05 |
|
| 54 |
+
| Target Modules | q_proj, k_proj, v_proj, o_proj |
|
| 55 |
+
| Trainable Parameters | 368 tensors |
|
| 56 |
+
|
| 57 |
+
### Training Hyperparameters
|
| 58 |
+
|
| 59 |
+
| Parameter | Value |
|
| 60 |
+
|-----------|-------|
|
| 61 |
+
| Effective Batch Size | 32 (1 × 2 grad_accum × 16 GPUs) |
|
| 62 |
+
| Sequence Length | 16,384 tokens |
|
| 63 |
+
| Chunk Overlap | 2,048 tokens |
|
| 64 |
+
| LR Scheduler | Cosine |
|
| 65 |
+
| Weight Decay | 0.01 |
|
| 66 |
+
| Max Grad Norm | 1.0 |
|
| 67 |
+
| Precision | BF16 |
|
| 68 |
+
|
| 69 |
+
## Curriculum Learning Phases
|
| 70 |
+
|
| 71 |
+
The model was trained using a 3-phase curriculum learning approach, where each phase builds upon the previous:
|
| 72 |
+
|
| 73 |
+
### Phase 1: Foundation (2 epochs)
|
| 74 |
+
| Metric | Value |
|
| 75 |
+
|--------|-------|
|
| 76 |
+
| **Dataset** | Codebase structure and file patterns |
|
| 77 |
+
| **Samples** | 9,293 train / 512 eval |
|
| 78 |
+
| **Learning Rate** | 2.5e-5 |
|
| 79 |
+
| **Warmup Ratio** | 0.15 |
|
| 80 |
+
| **Training Time** | 12.7 hours |
|
| 81 |
+
| **Final Eval Loss** | 0.365 |
|
| 82 |
+
| **Final Eval Accuracy** | 88.8% |
|
| 83 |
+
|
| 84 |
+
### Phase 2: Evolution (2 epochs)
|
| 85 |
+
| Metric | Value |
|
| 86 |
+
|--------|-------|
|
| 87 |
+
| **Dataset** | Commit patterns and code changes |
|
| 88 |
+
| **Samples** | 16,622 train / 1,545 eval |
|
| 89 |
+
| **Learning Rate** | 2.0e-5 |
|
| 90 |
+
| **Warmup Ratio** | 0.10 |
|
| 91 |
+
| **Training Time** | 24.7 hours |
|
| 92 |
+
| **Final Eval Loss** | 2.55 |
|
| 93 |
+
| **Final Eval Accuracy** | 40.8% |
|
| 94 |
+
|
| 95 |
+
*Note: Higher loss in Phase 2 is expected due to the complexity of diff/commit patterns.*
|
| 96 |
+
|
| 97 |
+
### Phase 3: PR Mastery (1 epoch)
|
| 98 |
+
| Metric | Value |
|
| 99 |
+
|--------|-------|
|
| 100 |
+
| **Dataset** | Pull request and review patterns |
|
| 101 |
+
| **Samples** | 9,797 train / 509 eval |
|
| 102 |
+
| **Learning Rate** | 1.5e-5 |
|
| 103 |
+
| **Warmup Ratio** | 0.05 |
|
| 104 |
+
| **Training Time** | 6.9 hours |
|
| 105 |
+
| **Final Eval Loss** | 0.501 |
|
| 106 |
+
| **Final Eval Accuracy** | 90.2% |
|
| 107 |
+
|
| 108 |
+
### Training Summary
|
| 109 |
+
|
| 110 |
+
| Metric | Value |
|
| 111 |
+
|--------|-------|
|
| 112 |
+
| **Total Training Time** | 44.9 hours |
|
| 113 |
+
| **Total Steps** | 1,926 |
|
| 114 |
+
| **Total Epochs** | 5 (2 + 2 + 1) |
|
| 115 |
+
| **Initial Train Loss** | 0.592 |
|
| 116 |
+
| **Final Train Loss** | 0.495 |
|
| 117 |
+
| **Final Perplexity** | 1.65 |
|
| 118 |
+
|
| 119 |
+
## Citation
|
| 120 |
+
|
| 121 |
+
If you use this model, please cite:
|
| 122 |
+
|
| 123 |
+
```bibtex
|
| 124 |
+
@misc{glm45air-hs-lora-curriculum,
|
| 125 |
+
title = {GLM-4.5-Air-HS-LoRA-CurriculumLearning},
|
| 126 |
+
author = {Aditya Narayan},
|
| 127 |
+
year = {2025},
|
| 128 |
+
publisher = {Hugging Face},
|
| 129 |
+
url = {https://huggingface.co/AdityaNarayan/GLM-4.5-Air-HS-LoRA-CurriculumLearning}
|
| 130 |
+
}
|
| 131 |
+
```
|
| 132 |
+
|
| 133 |
+
## Acknowledgments
|
| 134 |
+
|
| 135 |
+
- Base model: [GLM-4.5-Air](https://huggingface.co/zai-org/GLM-4.5-Air) by Zhipu AI
|
| 136 |
+
- Training framework: PyTorch FSDP + PEFT
|
| 137 |
+
- Dataset: Hyperswitch open-source repository by Juspay
|