YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
π Qwen2.5 1.5B Python Coder
Supervised Fine-Tuning (SFT) + VERL Reinforcement Learning
π§ Training Overview
πΉ Supervised Fine-Tuning (SFT)
- Hardware: 2Γ T4 GPUs (Kaggle)
- Dataset: https://huggingface.co/datasets/iamtarun/python_code_instructions_18k_alpaca
πΉ Reinforcement Learning (VERL)
- Platform: L4 GPU (Google Colab)
- Samples: 2,000
- Dataset: https://huggingface.co/datasets/KodCode/KodCode-V1-SFT-4o
- Reward Function:
- Based on the proportion of unit tests passed
π Evaluation
| Model Variant | Score |
|---|---|
| Baseline (Plain) | 0.000 |
| After SFT | 0.165 |
| After SFT + VERL | 0.287 |
β¨ Summary
- SFT provides a strong initial boost in coding capability
- VERL further improves performance by reinforcing test-passing behavior
- Combined approach yields a ~74% improvement over SFT alone
- Downloads last month
- 7
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support