thomasjhuang
/

qwen2-rloo-countdown-step350

Text Generation

reinforcement-learning

Model card Files Files and versions

qwen2-rloo-countdown-step350

Commit History

Add model card with training details

240e2b5
verified

thomasjhuang commited on Jun 10, 2025

RLOO checkpoint at optimizer step 350 - Fixed prompt format, temp=0.1, lr=3e-6

ca96d28
verified

thomasjhuang commited on Jun 10, 2025

RLOO checkpoint at optimizer step 350 - Fixed prompt format, temp=0.1, lr=3e-6

d679490
verified

thomasjhuang commited on Jun 10, 2025

initial commit

a61a5b8
verified

thomasjhuang commited on Jun 10, 2025