YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
ablation-pymethods2test-seqnorm-15-8B
RLOO length-bias ablation โ arm1 seqnorm (base) arm. RL-trained from the
a3 pre-RL base on the exp_rpt_pymethods2test-large task set with SkyRL
(RLOO-n advantage estimator, seq_mean_token_sum_norm_global loss reduction).
- Base model:
laion/GLM-4_7-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-131k-fixthink(a Qwen3-8B SFT) - Dataset:
DCAgent/exp_rpt_pymethods2test-large - Checkpoint:
global_step_15(selected by trailing-5 EMA ofreward/avg_raw_reward, alpha=1/3, over the full chain; max_steps=80, hf_save_interval=5) - Training framework: SkyRL (FSDP2, vLLM), Jupiter GH200
The launch config is included as rl_config.yaml.
Training Traces
Training-time Daytona/Harbor rollouts for this run are uploaded as a companion dataset: penfever/ablation-pymethods2test-seqnorm
The dataset contains the last episode of each trial (per
make_and_upload_trace_dataset --episodes last) โ the same rollouts
the policy was trained on after rollback / truncation.
- Downloads last month
- 33
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support