---
base_model: zai-org/GLM-4.6
tags:
  - rust
  - Hyperswitch
  - LoRA
  - CPT
  - Causal-LM
  - code-generation
  - phased-training
  - multiNode-training
  - curriculum-learning
  - FSDP
pipeline_tag: text-generation
language:
  - en
datasets:
  - AdityaNarayan/HS-Repo-Curriculum-Learning
library_name: peft
license: apache-2.0
---

# GLM-4.6-HS-LoRA-CurriculumLearning

A LoRA fine-tuned version of [GLM-4.6](https://huggingface.co/zai-org/GLM-4.6) (356B MoE) trained on the [Hyperswitch](https://github.com/juspay/hyperswitch) codebase using **Phased Curriculum Learning**.

## Model Description

This model is specifically trained to understand and assist with the Hyperswitch payment orchestration codebase. It was trained using a 3-phase curriculum learning approach on multi-node H200 GPUs with PyTorch FSDP.

### Key Features

- 🎯 **Domain-Specific**: Trained exclusively on Hyperswitch Rust codebase
- 📚 **Curriculum Learning**: 3-phase progressive training (Foundation → Evolution → PR Mastery)

## Training Details

### Hardware Configuration

| Component | Specification |
|-----------|---------------|
| GPUs | 16× NVIDIA H200 (144GB each) |
| Nodes | 2 nodes × 8 GPUs |
| Distributed Strategy | PyTorch FSDP (Full Shard) |
| Precision | BF16 Mixed Precision |

### LoRA Configuration

| Parameter | Value |
|-----------|-------|
| LoRA Rank (r) | 64 |
| LoRA Alpha | 128 |
| LoRA Dropout | 0.05 |
| Target Modules | q_proj, k_proj, v_proj, o_proj |
| Trainable Parameters | 736 tensors |

### Training Hyperparameters

| Parameter | Value |
|-----------|-------|
| Effective Batch Size | 32 (1 × 2 grad_accum × 16 GPUs) |
| Sequence Length | 16,384 tokens |
| Chunk Overlap | 2,048 tokens |
| LR Scheduler | Cosine |
| Weight Decay | 0.01 |
| Max Grad Norm | 1.0 |
| Precision | BF16 |

## Curriculum Learning Phases

The model was trained using a 3-phase curriculum learning approach, where each phase builds upon the previous:

### Phase 1: Foundation (2 epochs)
| Metric | Value |
|--------|-------|
| **Dataset** | Codebase structure and file patterns |
| **Samples** | 9,293 train / 512 eval |
| **Learning Rate** | 2.5e-5 |
| **Warmup Ratio** | 0.15 |
| **Training Time** | 32.3 hours |
| **Final Eval Loss** | 0.349 |
| **Final Eval Accuracy** | 90.6% |

### Phase 2: Evolution (2 epochs)
| Metric | Value |
|--------|-------|
| **Dataset** | Commit patterns and code changes |
| **Samples** | 16,622 train / 1,545 eval |
| **Learning Rate** | 2.0e-5 |
| **Warmup Ratio** | 0.10 |
| **Training Time** | 64.5 hours |
| **Final Eval Loss** | 2.46 |
| **Final Eval Accuracy** | 42.3% |

*Note: Higher loss in Phase 2 is expected due to the complexity of diff/commit patterns.*

### Phase 3: PR Mastery (1 epoch)
| Metric | Value |
|--------|-------|
| **Dataset** | Pull request and review patterns |
| **Samples** | 9,797 train / 509 eval |
| **Learning Rate** | 1.5e-5 |
| **Warmup Ratio** | 0.05 |
| **Training Time** | 17.8 hours |
| **Final Eval Loss** | 0.472 |
| **Final Eval Accuracy** | 90.8% |

### Training Summary

| Metric | Value |
|--------|-------|
| **Total Training Time** | 116.5 hours |
| **Total Steps** | 1,926 |
| **Total Epochs** | 5 (2 + 2 + 1) |
| **Initial Train Loss** | 0.609 |
| **Final Train Loss** | 0.465 |
| **Final Perplexity** | 1.60 |

## Citation

If you use this model, please cite:

```bibtex
@misc{glm46-hs-lora-curriculum,
  title = {GLM-4.6-HS-LoRA-CurriculumLearning},
  author = {Aditya Narayan},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/AdityaNarayan/GLM-4.6-HS-LoRA-CurriculumLearning}
}
```

## Acknowledgments

- Base model: [GLM-4.6](https://huggingface.co/zai-org/GLM-4.6) by Zhipu AI
- Training framework: PyTorch FSDP + PEFT
- Dataset: Hyperswitch open-source repository by Juspay