YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

πŸš€ Qwen2.5 1.5B Python Coder

Supervised Fine-Tuning (SFT) + VERL Reinforcement Learning


🧠 Training Overview

πŸ”Ή Supervised Fine-Tuning (SFT)

πŸ”Ή Reinforcement Learning (VERL)


πŸ“Š Evaluation

Model Variant Score
Baseline (Plain) 0.000
After SFT 0.165
After SFT + VERL 0.287

✨ Summary

  • SFT provides a strong initial boost in coding capability
  • VERL further improves performance by reinforcing test-passing behavior
  • Combined approach yields a ~74% improvement over SFT alone
Downloads last month
7
Safetensors
Model size
2B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using Issactoto/qwen2.5-1.5b-sft-verl-python 1