Create README.md

Browse files

Files changed (1) hide show

README.md +137 -0

README.md ADDED Viewed

	@@ -0,0 +1,137 @@

+---
+base_model: zai-org/GLM-4.5-Air
+tags:
+  - rust
+  - Hyperswitch
+  - LoRA
+  - CPT
+  - Causal-LM
+  - code-generation
+  - phased-training
+  - multiNode-training
+  - curriculum-learning
+  - FSDP
+pipeline_tag: text-generation
+language:
+  - en
+datasets:
+  - AdityaNarayan/HS-Repo-Curriculum-Learning
+library_name: peft
+license: apache-2.0
+---
+# GLM-4.5-Air-HS-LoRA-CurriculumLearning
+A LoRA fine-tuned version of [GLM-4.5-Air](https://huggingface.co/zai-org/GLM-4.5-Air) (108B MoE) trained on the [Hyperswitch](https://github.com/juspay/hyperswitch) codebase using **Phased Curriculum Learning**.
+## Model Description
+This model is specifically trained to understand and assist with the Hyperswitch payment orchestration codebase. It was trained using a 3-phase curriculum learning approach on multi-node H200 GPUs with PyTorch FSDP.
+### Key Features
+- 🎯 **Domain-Specific**: Trained exclusively on Hyperswitch Rust codebase
+- 📚 **Curriculum Learning**: 3-phase progressive training (Foundation → Evolution → PR Mastery)
+## Training Details
+### Hardware Configuration
+| Component | Specification |
+|-----------|---------------|
+| GPUs | 16× NVIDIA H200 (144GB each) |
+| Nodes | 2 nodes × 8 GPUs |
+| Distributed Strategy | PyTorch FSDP (Full Shard) |
+| Precision | BF16 Mixed Precision |
+### LoRA Configuration
+| Parameter | Value |
+|-----------|-------|
+| LoRA Rank (r) | 128 |
+| LoRA Alpha | 256 |
+| LoRA Dropout | 0.05 |
+| Target Modules | q_proj, k_proj, v_proj, o_proj |
+| Trainable Parameters | 368 tensors |
+### Training Hyperparameters
+| Parameter | Value |
+|-----------|-------|
+| Effective Batch Size | 32 (1 × 2 grad_accum × 16 GPUs) |
+| Sequence Length | 16,384 tokens |
+| Chunk Overlap | 2,048 tokens |
+| LR Scheduler | Cosine |
+| Weight Decay | 0.01 |
+| Max Grad Norm | 1.0 |
+| Precision | BF16 |
+## Curriculum Learning Phases
+The model was trained using a 3-phase curriculum learning approach, where each phase builds upon the previous:
+### Phase 1: Foundation (2 epochs)
+| Metric | Value |
+|--------|-------|
+| **Dataset** | Codebase structure and file patterns |
+| **Samples** | 9,293 train / 512 eval |
+| **Learning Rate** | 2.5e-5 |
+| **Warmup Ratio** | 0.15 |
+| **Training Time** | 12.7 hours |
+| **Final Eval Loss** | 0.365 |
+| **Final Eval Accuracy** | 88.8% |
+### Phase 2: Evolution (2 epochs)
+| Metric | Value |
+|--------|-------|
+| **Dataset** | Commit patterns and code changes |
+| **Samples** | 16,622 train / 1,545 eval |
+| **Learning Rate** | 2.0e-5 |
+| **Warmup Ratio** | 0.10 |
+| **Training Time** | 24.7 hours |
+| **Final Eval Loss** | 2.55 |
+| **Final Eval Accuracy** | 40.8% |
+*Note: Higher loss in Phase 2 is expected due to the complexity of diff/commit patterns.*
+### Phase 3: PR Mastery (1 epoch)
+| Metric | Value |
+|--------|-------|
+| **Dataset** | Pull request and review patterns |
+| **Samples** | 9,797 train / 509 eval |
+| **Learning Rate** | 1.5e-5 |
+| **Warmup Ratio** | 0.05 |
+| **Training Time** | 6.9 hours |
+| **Final Eval Loss** | 0.501 |
+| **Final Eval Accuracy** | 90.2% |
+### Training Summary
+| Metric | Value |
+|--------|-------|
+| **Total Training Time** | 44.9 hours |
+| **Total Steps** | 1,926 |
+| **Total Epochs** | 5 (2 + 2 + 1) |
+| **Initial Train Loss** | 0.592 |
+| **Final Train Loss** | 0.495 |
+| **Final Perplexity** | 1.65 |
+## Citation
+If you use this model, please cite:
+```bibtex
+@misc{glm45air-hs-lora-curriculum,
+  title = {GLM-4.5-Air-HS-LoRA-CurriculumLearning},
+  author = {Aditya Narayan},
+  year = {2025},
+  publisher = {Hugging Face},
+  url = {https://huggingface.co/AdityaNarayan/GLM-4.5-Air-HS-LoRA-CurriculumLearning}
+}
+```
+## Acknowledgments
+- Base model: [GLM-4.5-Air](https://huggingface.co/zai-org/GLM-4.5-Air) by Zhipu AI
+- Training framework: PyTorch FSDP + PEFT
+- Dataset: Hyperswitch open-source repository by Juspay