YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

ablation-pymethods2test-seqnorm-15-8B

RLOO length-bias ablation โ€” arm1 seqnorm (base) arm. RL-trained from the a3 pre-RL base on the exp_rpt_pymethods2test-large task set with SkyRL (RLOO-n advantage estimator, seq_mean_token_sum_norm_global loss reduction).

  • Base model: laion/GLM-4_7-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-131k-fixthink (a Qwen3-8B SFT)
  • Dataset: DCAgent/exp_rpt_pymethods2test-large
  • Checkpoint: global_step_15 (selected by trailing-5 EMA of reward/avg_raw_reward, alpha=1/3, over the full chain; max_steps=80, hf_save_interval=5)
  • Training framework: SkyRL (FSDP2, vLLM), Jupiter GH200

The launch config is included as rl_config.yaml.

Training Traces

Training-time Daytona/Harbor rollouts for this run are uploaded as a companion dataset: penfever/ablation-pymethods2test-seqnorm

The dataset contains the last episode of each trial (per make_and_upload_trace_dataset --episodes last) โ€” the same rollouts the policy was trained on after rollback / truncation.

Downloads last month
33
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support