a3-rl-laion_exp_rpt_methods2test-large-v2

RL (RLOO-n, token_mean) finetune of a Qwen3-8B base (laion/GLM-4_7-swesmith...-fixthink) on the methods2test-large-v2 agentic task set via SkyRL terminal-bench / terminus-2.

PARTIAL CHECKPOINT — cancelled mid-run. SLURM job 589447 was cancelled on 2026-06-06 because the a3 experiment series was concluded as uninformative. This artifact is the latest HF-ready export: global_step_50 (a raw FSDP checkpoint exists at step 52/53 but was not exported). Training was healthy at cancellation (avg_raw_reward ~0.43 -> ~0.70, pass@8 ~0.81 -> ~0.88, entropy stable ~0.07 -> 0.11, no collapse).

Config: hpc/skyrl_yaml/jupiter/56GPU_base.yaml (a3 chain, 14 nodes x 4 GPU)
Algorithm: rloo_n, eps_clip 0.2/0.05, lr 8e-6, no KL loss
Status: partial / a3-series-concluded

Downloads last month: -

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for laion/a3-rl-laion_exp_rpt_methods2test-large-v2

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Finetuned

laion/GLM-4_7-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-131k-fixthink

Finetuned

(14)

this model