AdityaNarayan's picture
Create README.md
377a6bf verified
---
base_model: zai-org/GLM-4.6
tags:
- rust
- Hyperswitch
- LoRA
- CPT
- Causal-LM
- code-generation
- phased-training
- multiNode-training
- curriculum-learning
- FSDP
pipeline_tag: text-generation
language:
- en
datasets:
- AdityaNarayan/HS-Repo-Curriculum-Learning
library_name: peft
license: apache-2.0
---
# GLM-4.6-HS-LoRA-CurriculumLearning
A LoRA fine-tuned version of [GLM-4.6](https://huggingface.co/zai-org/GLM-4.6) (356B MoE) trained on the [Hyperswitch](https://github.com/juspay/hyperswitch) codebase using **Phased Curriculum Learning**.
## Model Description
This model is specifically trained to understand and assist with the Hyperswitch payment orchestration codebase. It was trained using a 3-phase curriculum learning approach on multi-node H200 GPUs with PyTorch FSDP.
### Key Features
- 🎯 **Domain-Specific**: Trained exclusively on Hyperswitch Rust codebase
- 📚 **Curriculum Learning**: 3-phase progressive training (Foundation → Evolution → PR Mastery)
## Training Details
### Hardware Configuration
| Component | Specification |
|-----------|---------------|
| GPUs | 16× NVIDIA H200 (144GB each) |
| Nodes | 2 nodes × 8 GPUs |
| Distributed Strategy | PyTorch FSDP (Full Shard) |
| Precision | BF16 Mixed Precision |
### LoRA Configuration
| Parameter | Value |
|-----------|-------|
| LoRA Rank (r) | 64 |
| LoRA Alpha | 128 |
| LoRA Dropout | 0.05 |
| Target Modules | q_proj, k_proj, v_proj, o_proj |
| Trainable Parameters | 736 tensors |
### Training Hyperparameters
| Parameter | Value |
|-----------|-------|
| Effective Batch Size | 32 (1 × 2 grad_accum × 16 GPUs) |
| Sequence Length | 16,384 tokens |
| Chunk Overlap | 2,048 tokens |
| LR Scheduler | Cosine |
| Weight Decay | 0.01 |
| Max Grad Norm | 1.0 |
| Precision | BF16 |
## Curriculum Learning Phases
The model was trained using a 3-phase curriculum learning approach, where each phase builds upon the previous:
### Phase 1: Foundation (2 epochs)
| Metric | Value |
|--------|-------|
| **Dataset** | Codebase structure and file patterns |
| **Samples** | 9,293 train / 512 eval |
| **Learning Rate** | 2.5e-5 |
| **Warmup Ratio** | 0.15 |
| **Training Time** | 32.3 hours |
| **Final Eval Loss** | 0.349 |
| **Final Eval Accuracy** | 90.6% |
### Phase 2: Evolution (2 epochs)
| Metric | Value |
|--------|-------|
| **Dataset** | Commit patterns and code changes |
| **Samples** | 16,622 train / 1,545 eval |
| **Learning Rate** | 2.0e-5 |
| **Warmup Ratio** | 0.10 |
| **Training Time** | 64.5 hours |
| **Final Eval Loss** | 2.46 |
| **Final Eval Accuracy** | 42.3% |
*Note: Higher loss in Phase 2 is expected due to the complexity of diff/commit patterns.*
### Phase 3: PR Mastery (1 epoch)
| Metric | Value |
|--------|-------|
| **Dataset** | Pull request and review patterns |
| **Samples** | 9,797 train / 509 eval |
| **Learning Rate** | 1.5e-5 |
| **Warmup Ratio** | 0.05 |
| **Training Time** | 17.8 hours |
| **Final Eval Loss** | 0.472 |
| **Final Eval Accuracy** | 90.8% |
### Training Summary
| Metric | Value |
|--------|-------|
| **Total Training Time** | 116.5 hours |
| **Total Steps** | 1,926 |
| **Total Epochs** | 5 (2 + 2 + 1) |
| **Initial Train Loss** | 0.609 |
| **Final Train Loss** | 0.465 |
| **Final Perplexity** | 1.60 |
## Citation
If you use this model, please cite:
```bibtex
@misc{glm46-hs-lora-curriculum,
title = {GLM-4.6-HS-LoRA-CurriculumLearning},
author = {Aditya Narayan},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/AdityaNarayan/GLM-4.6-HS-LoRA-CurriculumLearning}
}
```
## Acknowledgments
- Base model: [GLM-4.6](https://huggingface.co/zai-org/GLM-4.6) by Zhipu AI
- Training framework: PyTorch FSDP + PEFT
- Dataset: Hyperswitch open-source repository by Juspay