Update Colab notebook: 1.5B model, scaled rewards, tuned hyperparameters ee8c2d4 Running nihalaninihal commited on 3 days ago
Align with Advanced Llama 3.2 GRPO LoRA reference notebook pattern c7d253a nihalaninihal Claude Opus 4.6 commited on 3 days ago
Fix format_comparison_metrics_html to accept run_comparison() dict directly d52b449 nihalaninihal Claude Opus 4.6 commited on 3 days ago
Align train.py and Colab notebook with official Unsloth+OpenEnv GRPO patterns e09a415 nihalaninihal Claude Opus 4.6 commited on 3 days ago
Update metrics format with drift/oversight tracking, add colab training notebook 5e0f2b1 nihalaninihal commited on 3 days ago