Issactoto
/

qwen2.5-1.5b-sft-verl-python

Model card Files Files and versions

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

🚀 Qwen2.5 1.5B Python Coder

Supervised Fine-Tuning (SFT) + VERL Reinforcement Learning

🧠 Training Overview

🔹 Supervised Fine-Tuning (SFT)

Hardware: 2× T4 GPUs (Kaggle)
Dataset: https://huggingface.co/datasets/iamtarun/python_code_instructions_18k_alpaca

🔹 Reinforcement Learning (VERL)

Platform: L4 GPU (Google Colab)
Samples: 2,000
Dataset: https://huggingface.co/datasets/KodCode/KodCode-V1-SFT-4o
Reward Function:
- Based on the proportion of unit tests passed

📊 Evaluation

Benchmark: https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard

Model Variant	Score
Baseline (Plain)	0.000
After SFT	0.165
After SFT + VERL	0.287

✨ Summary

SFT provides a strong initial boost in coding capability
VERL further improves performance by reinforcing test-passing behavior
Combined approach yields a ~74% improvement over SFT alone

Downloads last month: 7

Safetensors

Model size

2B params

Tensor type

BF16

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using Issactoto/qwen2.5-1.5b-sft-verl-python 1