|
|
--- |
|
|
base_model: zai-org/GLM-4.6 |
|
|
tags: |
|
|
- rust |
|
|
- Hyperswitch |
|
|
- LoRA |
|
|
- CPT |
|
|
- Causal-LM |
|
|
- code-generation |
|
|
- phased-training |
|
|
- multiNode-training |
|
|
- curriculum-learning |
|
|
- FSDP |
|
|
pipeline_tag: text-generation |
|
|
language: |
|
|
- en |
|
|
datasets: |
|
|
- AdityaNarayan/HS-Repo-Curriculum-Learning |
|
|
library_name: peft |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
# GLM-4.6-HS-LoRA-CurriculumLearning |
|
|
|
|
|
A LoRA fine-tuned version of [GLM-4.6](https://huggingface.co/zai-org/GLM-4.6) (356B MoE) trained on the [Hyperswitch](https://github.com/juspay/hyperswitch) codebase using **Phased Curriculum Learning**. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model is specifically trained to understand and assist with the Hyperswitch payment orchestration codebase. It was trained using a 3-phase curriculum learning approach on multi-node H200 GPUs with PyTorch FSDP. |
|
|
|
|
|
### Key Features |
|
|
|
|
|
- 🎯 **Domain-Specific**: Trained exclusively on Hyperswitch Rust codebase |
|
|
- 📚 **Curriculum Learning**: 3-phase progressive training (Foundation → Evolution → PR Mastery) |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Hardware Configuration |
|
|
|
|
|
| Component | Specification | |
|
|
|-----------|---------------| |
|
|
| GPUs | 16× NVIDIA H200 (144GB each) | |
|
|
| Nodes | 2 nodes × 8 GPUs | |
|
|
| Distributed Strategy | PyTorch FSDP (Full Shard) | |
|
|
| Precision | BF16 Mixed Precision | |
|
|
|
|
|
### LoRA Configuration |
|
|
|
|
|
| Parameter | Value | |
|
|
|-----------|-------| |
|
|
| LoRA Rank (r) | 64 | |
|
|
| LoRA Alpha | 128 | |
|
|
| LoRA Dropout | 0.05 | |
|
|
| Target Modules | q_proj, k_proj, v_proj, o_proj | |
|
|
| Trainable Parameters | 736 tensors | |
|
|
|
|
|
### Training Hyperparameters |
|
|
|
|
|
| Parameter | Value | |
|
|
|-----------|-------| |
|
|
| Effective Batch Size | 32 (1 × 2 grad_accum × 16 GPUs) | |
|
|
| Sequence Length | 16,384 tokens | |
|
|
| Chunk Overlap | 2,048 tokens | |
|
|
| LR Scheduler | Cosine | |
|
|
| Weight Decay | 0.01 | |
|
|
| Max Grad Norm | 1.0 | |
|
|
| Precision | BF16 | |
|
|
|
|
|
## Curriculum Learning Phases |
|
|
|
|
|
The model was trained using a 3-phase curriculum learning approach, where each phase builds upon the previous: |
|
|
|
|
|
### Phase 1: Foundation (2 epochs) |
|
|
| Metric | Value | |
|
|
|--------|-------| |
|
|
| **Dataset** | Codebase structure and file patterns | |
|
|
| **Samples** | 9,293 train / 512 eval | |
|
|
| **Learning Rate** | 2.5e-5 | |
|
|
| **Warmup Ratio** | 0.15 | |
|
|
| **Training Time** | 32.3 hours | |
|
|
| **Final Eval Loss** | 0.349 | |
|
|
| **Final Eval Accuracy** | 90.6% | |
|
|
|
|
|
### Phase 2: Evolution (2 epochs) |
|
|
| Metric | Value | |
|
|
|--------|-------| |
|
|
| **Dataset** | Commit patterns and code changes | |
|
|
| **Samples** | 16,622 train / 1,545 eval | |
|
|
| **Learning Rate** | 2.0e-5 | |
|
|
| **Warmup Ratio** | 0.10 | |
|
|
| **Training Time** | 64.5 hours | |
|
|
| **Final Eval Loss** | 2.46 | |
|
|
| **Final Eval Accuracy** | 42.3% | |
|
|
|
|
|
*Note: Higher loss in Phase 2 is expected due to the complexity of diff/commit patterns.* |
|
|
|
|
|
### Phase 3: PR Mastery (1 epoch) |
|
|
| Metric | Value | |
|
|
|--------|-------| |
|
|
| **Dataset** | Pull request and review patterns | |
|
|
| **Samples** | 9,797 train / 509 eval | |
|
|
| **Learning Rate** | 1.5e-5 | |
|
|
| **Warmup Ratio** | 0.05 | |
|
|
| **Training Time** | 17.8 hours | |
|
|
| **Final Eval Loss** | 0.472 | |
|
|
| **Final Eval Accuracy** | 90.8% | |
|
|
|
|
|
### Training Summary |
|
|
|
|
|
| Metric | Value | |
|
|
|--------|-------| |
|
|
| **Total Training Time** | 116.5 hours | |
|
|
| **Total Steps** | 1,926 | |
|
|
| **Total Epochs** | 5 (2 + 2 + 1) | |
|
|
| **Initial Train Loss** | 0.609 | |
|
|
| **Final Train Loss** | 0.465 | |
|
|
| **Final Perplexity** | 1.60 | |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{glm46-hs-lora-curriculum, |
|
|
title = {GLM-4.6-HS-LoRA-CurriculumLearning}, |
|
|
author = {Aditya Narayan}, |
|
|
year = {2025}, |
|
|
publisher = {Hugging Face}, |
|
|
url = {https://huggingface.co/AdityaNarayan/GLM-4.6-HS-LoRA-CurriculumLearning} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- Base model: [GLM-4.6](https://huggingface.co/zai-org/GLM-4.6) by Zhipu AI |
|
|
- Training framework: PyTorch FSDP + PEFT |
|
|
- Dataset: Hyperswitch open-source repository by Juspay |
|
|
|