README.md · AdityaNarayan/GLM-4.6-HS-LoRA-CurriculumLearning at main

GLM-4.6-HS-LoRA-CurriculumLearning / README.md

AdityaNarayan

Create README.md

377a6bf verified 20 days ago

preview code

raw

history blame contribute delete

3.81 kB

	---
	base_model: zai-org/GLM-4.6
	tags:
	- rust
	- Hyperswitch
	- LoRA
	- CPT
	- Causal-LM
	- code-generation
	- phased-training
	- multiNode-training
	- curriculum-learning
	- FSDP
	pipeline_tag: text-generation
	language:
	- en
	datasets:
	- AdityaNarayan/HS-Repo-Curriculum-Learning
	library_name: peft
	license: apache-2.0
	---

	# GLM-4.6-HS-LoRA-CurriculumLearning

	A LoRA fine-tuned version of [GLM-4.6](https://huggingface.co/zai-org/GLM-4.6) (356B MoE) trained on the [Hyperswitch](https://github.com/juspay/hyperswitch) codebase using Phased Curriculum Learning.

	## Model Description

	This model is specifically trained to understand and assist with the Hyperswitch payment orchestration codebase. It was trained using a 3-phase curriculum learning approach on multi-node H200 GPUs with PyTorch FSDP.

	### Key Features

	- 🎯 Domain-Specific: Trained exclusively on Hyperswitch Rust codebase
	- 📚 Curriculum Learning: 3-phase progressive training (Foundation → Evolution → PR Mastery)

	## Training Details

	### Hardware Configuration

	\| Component \| Specification \|
	\|-----------\|---------------\|
	\| GPUs \| 16× NVIDIA H200 (144GB each) \|
	\| Nodes \| 2 nodes × 8 GPUs \|
	\| Distributed Strategy \| PyTorch FSDP (Full Shard) \|
	\| Precision \| BF16 Mixed Precision \|

	### LoRA Configuration

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| LoRA Rank (r) \| 64 \|
	\| LoRA Alpha \| 128 \|
	\| LoRA Dropout \| 0.05 \|
	\| Target Modules \| q_proj, k_proj, v_proj, o_proj \|
	\| Trainable Parameters \| 736 tensors \|

	### Training Hyperparameters

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Effective Batch Size \| 32 (1 × 2 grad_accum × 16 GPUs) \|
	\| Sequence Length \| 16,384 tokens \|
	\| Chunk Overlap \| 2,048 tokens \|
	\| LR Scheduler \| Cosine \|
	\| Weight Decay \| 0.01 \|
	\| Max Grad Norm \| 1.0 \|
	\| Precision \| BF16 \|

	## Curriculum Learning Phases

	The model was trained using a 3-phase curriculum learning approach, where each phase builds upon the previous:

	### Phase 1: Foundation (2 epochs)
	\| Metric \| Value \|
	\|--------\|-------\|
	\| Dataset \| Codebase structure and file patterns \|
	\| Samples \| 9,293 train / 512 eval \|
	\| Learning Rate \| 2.5e-5 \|
	\| Warmup Ratio \| 0.15 \|
	\| Training Time \| 32.3 hours \|
	\| Final Eval Loss \| 0.349 \|
	\| Final Eval Accuracy \| 90.6% \|

	### Phase 2: Evolution (2 epochs)
	\| Metric \| Value \|
	\|--------\|-------\|
	\| Dataset \| Commit patterns and code changes \|
	\| Samples \| 16,622 train / 1,545 eval \|
	\| Learning Rate \| 2.0e-5 \|
	\| Warmup Ratio \| 0.10 \|
	\| Training Time \| 64.5 hours \|
	\| Final Eval Loss \| 2.46 \|
	\| Final Eval Accuracy \| 42.3% \|

	Note: Higher loss in Phase 2 is expected due to the complexity of diff/commit patterns.

	### Phase 3: PR Mastery (1 epoch)
	\| Metric \| Value \|
	\|--------\|-------\|
	\| Dataset \| Pull request and review patterns \|
	\| Samples \| 9,797 train / 509 eval \|
	\| Learning Rate \| 1.5e-5 \|
	\| Warmup Ratio \| 0.05 \|
	\| Training Time \| 17.8 hours \|
	\| Final Eval Loss \| 0.472 \|
	\| Final Eval Accuracy \| 90.8% \|

	### Training Summary

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Total Training Time \| 116.5 hours \|
	\| Total Steps \| 1,926 \|
	\| Total Epochs \| 5 (2 + 2 + 1) \|
	\| Initial Train Loss \| 0.609 \|
	\| Final Train Loss \| 0.465 \|
	\| Final Perplexity \| 1.60 \|

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{glm46-hs-lora-curriculum,
	title = {GLM-4.6-HS-LoRA-CurriculumLearning},
	author = {Aditya Narayan},
	year = {2025},
	publisher = {Hugging Face},
	url = {https://huggingface.co/AdityaNarayan/GLM-4.6-HS-LoRA-CurriculumLearning}
	}
	```

	## Acknowledgments

	- Base model: [GLM-4.6](https://huggingface.co/zai-org/GLM-4.6) by Zhipu AI
	- Training framework: PyTorch FSDP + PEFT
	- Dataset: Hyperswitch open-source repository by Juspay

	---
	base_model: zai-org/GLM-4.6
	tags:
	- rust
	- Hyperswitch
	- LoRA
	- CPT
	- Causal-LM
	- code-generation
	- phased-training
	- multiNode-training
	- curriculum-learning
	- FSDP
	pipeline_tag: text-generation
	language:
	- en
	datasets:
	- AdityaNarayan/HS-Repo-Curriculum-Learning
	library_name: peft
	license: apache-2.0
	---

	# GLM-4.6-HS-LoRA-CurriculumLearning

	A LoRA fine-tuned version of [GLM-4.6](https://huggingface.co/zai-org/GLM-4.6) (356B MoE) trained on the [Hyperswitch](https://github.com/juspay/hyperswitch) codebase using Phased Curriculum Learning.

	## Model Description

	This model is specifically trained to understand and assist with the Hyperswitch payment orchestration codebase. It was trained using a 3-phase curriculum learning approach on multi-node H200 GPUs with PyTorch FSDP.

	### Key Features

	- 🎯 Domain-Specific: Trained exclusively on Hyperswitch Rust codebase
	- 📚 Curriculum Learning: 3-phase progressive training (Foundation → Evolution → PR Mastery)

	## Training Details

	### Hardware Configuration

	\| Component \| Specification \|
	\|-----------\|---------------\|
	\| GPUs \| 16× NVIDIA H200 (144GB each) \|
	\| Nodes \| 2 nodes × 8 GPUs \|
	\| Distributed Strategy \| PyTorch FSDP (Full Shard) \|
	\| Precision \| BF16 Mixed Precision \|

	### LoRA Configuration

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| LoRA Rank (r) \| 64 \|
	\| LoRA Alpha \| 128 \|
	\| LoRA Dropout \| 0.05 \|
	\| Target Modules \| q_proj, k_proj, v_proj, o_proj \|
	\| Trainable Parameters \| 736 tensors \|

	### Training Hyperparameters

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Effective Batch Size \| 32 (1 × 2 grad_accum × 16 GPUs) \|
	\| Sequence Length \| 16,384 tokens \|
	\| Chunk Overlap \| 2,048 tokens \|
	\| LR Scheduler \| Cosine \|
	\| Weight Decay \| 0.01 \|
	\| Max Grad Norm \| 1.0 \|
	\| Precision \| BF16 \|

	## Curriculum Learning Phases

	The model was trained using a 3-phase curriculum learning approach, where each phase builds upon the previous:

	### Phase 1: Foundation (2 epochs)
	\| Metric \| Value \|
	\|--------\|-------\|
	\| Dataset \| Codebase structure and file patterns \|
	\| Samples \| 9,293 train / 512 eval \|
	\| Learning Rate \| 2.5e-5 \|
	\| Warmup Ratio \| 0.15 \|
	\| Training Time \| 32.3 hours \|
	\| Final Eval Loss \| 0.349 \|
	\| Final Eval Accuracy \| 90.6% \|

	### Phase 2: Evolution (2 epochs)
	\| Metric \| Value \|
	\|--------\|-------\|
	\| Dataset \| Commit patterns and code changes \|
	\| Samples \| 16,622 train / 1,545 eval \|
	\| Learning Rate \| 2.0e-5 \|
	\| Warmup Ratio \| 0.10 \|
	\| Training Time \| 64.5 hours \|
	\| Final Eval Loss \| 2.46 \|
	\| Final Eval Accuracy \| 42.3% \|

	Note: Higher loss in Phase 2 is expected due to the complexity of diff/commit patterns.

	### Phase 3: PR Mastery (1 epoch)
	\| Metric \| Value \|
	\|--------\|-------\|
	\| Dataset \| Pull request and review patterns \|
	\| Samples \| 9,797 train / 509 eval \|
	\| Learning Rate \| 1.5e-5 \|
	\| Warmup Ratio \| 0.05 \|
	\| Training Time \| 17.8 hours \|
	\| Final Eval Loss \| 0.472 \|
	\| Final Eval Accuracy \| 90.8% \|

	### Training Summary

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Total Training Time \| 116.5 hours \|
	\| Total Steps \| 1,926 \|
	\| Total Epochs \| 5 (2 + 2 + 1) \|
	\| Initial Train Loss \| 0.609 \|
	\| Final Train Loss \| 0.465 \|
	\| Final Perplexity \| 1.60 \|

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{glm46-hs-lora-curriculum,
	title = {GLM-4.6-HS-LoRA-CurriculumLearning},
	author = {Aditya Narayan},
	year = {2025},
	publisher = {Hugging Face},
	url = {https://huggingface.co/AdityaNarayan/GLM-4.6-HS-LoRA-CurriculumLearning}
	}
	```

	## Acknowledgments

	- Base model: [GLM-4.6](https://huggingface.co/zai-org/GLM-4.6) by Zhipu AI
	- Training framework: PyTorch FSDP + PEFT
	- Dataset: Hyperswitch open-source repository by Juspay