Text Generation
Safetensors
English
trl
qwen3
dpo
agent
tool-use
alfworld
conversational

Commit History

Upload DPO-trained Qwen3-4B-Instruct-2507 model
c87d0f7
verified

rokugatsu commited on

initial commit
5647194
verified

rokugatsu commited on