a3-rl-laion_exp_rpt_methods2test-large-v2

RL (RLOO-n, token_mean) finetune of a Qwen3-8B base (laion/GLM-4_7-swesmith...-fixthink) on the methods2test-large-v2 agentic task set via SkyRL terminal-bench / terminus-2.

PARTIAL CHECKPOINT — cancelled mid-run. SLURM job 589447 was cancelled on 2026-06-06 because the a3 experiment series was concluded as uninformative. This artifact is the latest HF-ready export: global_step_50 (a raw FSDP checkpoint exists at step 52/53 but was not exported). Training was healthy at cancellation (avg_raw_reward ~0.43 -> ~0.70, pass@8 ~0.81 -> ~0.88, entropy stable ~0.07 -> 0.11, no collapse).

  • Config: hpc/skyrl_yaml/jupiter/56GPU_base.yaml (a3 chain, 14 nodes x 4 GPU)
  • Algorithm: rloo_n, eps_clip 0.2/0.05, lr 8e-6, no KL loss
  • Status: partial / a3-series-concluded
Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for laion/a3-rl-laion_exp_rpt_methods2test-large-v2