thomasjhuang
/

qwen2-rloo-countdown-step150

Text Generation

reinforcement-learning

Model card Files Files and versions

qwen2-rloo-countdown-step150

Commit History

Add model card with training details

5473b25
verified

thomasjhuang commited on Jun 10, 2025

RLOO checkpoint at optimizer step 150 - Fixed prompt format, temp=0.1, lr=3e-6

8c36a57
verified

thomasjhuang commited on Jun 10, 2025

RLOO checkpoint at optimizer step 150 - Fixed prompt format, temp=0.1, lr=3e-6

85219cc
verified

thomasjhuang commited on Jun 10, 2025

initial commit

649e82b
verified

thomasjhuang commited on Jun 10, 2025