--- base_model: zai-org/GLM-4.6 tags: - rust - Hyperswitch - LoRA - CPT - Causal-LM - code-generation - phased-training - multiNode-training - curriculum-learning - FSDP pipeline_tag: text-generation language: - en datasets: - AdityaNarayan/HS-Repo-Curriculum-Learning library_name: peft license: apache-2.0 --- # GLM-4.6-HS-LoRA-CurriculumLearning A LoRA fine-tuned version of [GLM-4.6](https://huggingface.co/zai-org/GLM-4.6) (356B MoE) trained on the [Hyperswitch](https://github.com/juspay/hyperswitch) codebase using **Phased Curriculum Learning**. ## Model Description This model is specifically trained to understand and assist with the Hyperswitch payment orchestration codebase. It was trained using a 3-phase curriculum learning approach on multi-node H200 GPUs with PyTorch FSDP. ### Key Features - 🎯 **Domain-Specific**: Trained exclusively on Hyperswitch Rust codebase - 📚 **Curriculum Learning**: 3-phase progressive training (Foundation → Evolution → PR Mastery) ## Training Details ### Hardware Configuration | Component | Specification | |-----------|---------------| | GPUs | 16× NVIDIA H200 (144GB each) | | Nodes | 2 nodes × 8 GPUs | | Distributed Strategy | PyTorch FSDP (Full Shard) | | Precision | BF16 Mixed Precision | ### LoRA Configuration | Parameter | Value | |-----------|-------| | LoRA Rank (r) | 64 | | LoRA Alpha | 128 | | LoRA Dropout | 0.05 | | Target Modules | q_proj, k_proj, v_proj, o_proj | | Trainable Parameters | 736 tensors | ### Training Hyperparameters | Parameter | Value | |-----------|-------| | Effective Batch Size | 32 (1 × 2 grad_accum × 16 GPUs) | | Sequence Length | 16,384 tokens | | Chunk Overlap | 2,048 tokens | | LR Scheduler | Cosine | | Weight Decay | 0.01 | | Max Grad Norm | 1.0 | | Precision | BF16 | ## Curriculum Learning Phases The model was trained using a 3-phase curriculum learning approach, where each phase builds upon the previous: ### Phase 1: Foundation (2 epochs) | Metric | Value | |--------|-------| | **Dataset** | Codebase structure and file patterns | | **Samples** | 9,293 train / 512 eval | | **Learning Rate** | 2.5e-5 | | **Warmup Ratio** | 0.15 | | **Training Time** | 32.3 hours | | **Final Eval Loss** | 0.349 | | **Final Eval Accuracy** | 90.6% | ### Phase 2: Evolution (2 epochs) | Metric | Value | |--------|-------| | **Dataset** | Commit patterns and code changes | | **Samples** | 16,622 train / 1,545 eval | | **Learning Rate** | 2.0e-5 | | **Warmup Ratio** | 0.10 | | **Training Time** | 64.5 hours | | **Final Eval Loss** | 2.46 | | **Final Eval Accuracy** | 42.3% | *Note: Higher loss in Phase 2 is expected due to the complexity of diff/commit patterns.* ### Phase 3: PR Mastery (1 epoch) | Metric | Value | |--------|-------| | **Dataset** | Pull request and review patterns | | **Samples** | 9,797 train / 509 eval | | **Learning Rate** | 1.5e-5 | | **Warmup Ratio** | 0.05 | | **Training Time** | 17.8 hours | | **Final Eval Loss** | 0.472 | | **Final Eval Accuracy** | 90.8% | ### Training Summary | Metric | Value | |--------|-------| | **Total Training Time** | 116.5 hours | | **Total Steps** | 1,926 | | **Total Epochs** | 5 (2 + 2 + 1) | | **Initial Train Loss** | 0.609 | | **Final Train Loss** | 0.465 | | **Final Perplexity** | 1.60 | ## Citation If you use this model, please cite: ```bibtex @misc{glm46-hs-lora-curriculum, title = {GLM-4.6-HS-LoRA-CurriculumLearning}, author = {Aditya Narayan}, year = {2025}, publisher = {Hugging Face}, url = {https://huggingface.co/AdityaNarayan/GLM-4.6-HS-LoRA-CurriculumLearning} } ``` ## Acknowledgments - Base model: [GLM-4.6](https://huggingface.co/zai-org/GLM-4.6) by Zhipu AI - Training framework: PyTorch FSDP + PEFT - Dataset: Hyperswitch open-source repository by Juspay