swesmith_8b-step35

RL-trained Qwen3-8B on SWEsmith tasks (32k context, no rope scaling, 35 steps).

Training Details

Parameter Value
Base model laion/r2egym-nl2bash-stack-bugsseq-fixthink-again (Qwen3-8B SFT)
Dataset SWEsmith oracle-verified (2,500 tasks, 120s timeout)
Algorithm RLOO-N (Leave-One-Out with neutral masking)
Learning rate 2.0e-5
Train batch size 32
Samples per prompt 8
Max episodes 64
Max generate length 8,192 tokens
Max input tokens 24,000
Max model length 32,768
Rope scaling None (32k native context)
KL loss Disabled
Reward shaping Enabled (pass_ratio)
Staleness steps 16
Policy nodes 2 (8 GPUs, FSDP2)
Inference engines 20 (TP=1)
Training steps 35
Framework BenSkyRL + Harbor

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("laion/swesmith_8b-step35")
tokenizer = AutoTokenizer.from_pretrained("laion/swesmith_8b-step35")
Downloads last month
38
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for laion/swesmith_8b-step35

Finetuned
Qwen/Qwen3-8B
Finetuned
(3)
this model