a3-rl-laion_exp_rpt_methods2test-large-v2
RL (RLOO-n, token_mean) finetune of a Qwen3-8B base (laion/GLM-4_7-swesmith...-fixthink) on the methods2test-large-v2 agentic task set via SkyRL terminal-bench / terminus-2.
PARTIAL CHECKPOINT — cancelled mid-run. SLURM job 589447 was cancelled on 2026-06-06 because the a3 experiment series was concluded as uninformative. This artifact is the latest HF-ready export: global_step_50 (a raw FSDP checkpoint exists at step 52/53 but was not exported). Training was healthy at cancellation (avg_raw_reward ~0.43 -> ~0.70, pass@8 ~0.81 -> ~0.88, entropy stable ~0.07 -> 0.11, no collapse).
- Config: hpc/skyrl_yaml/jupiter/56GPU_base.yaml (a3 chain, 14 nodes x 4 GPU)
- Algorithm: rloo_n, eps_clip 0.2/0.05, lr 8e-6, no KL loss
- Status: partial / a3-series-concluded
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for laion/a3-rl-laion_exp_rpt_methods2test-large-v2
Base model
Qwen/Qwen3-8B-Base Finetuned
Qwen/Qwen3-8B