trl-lib
/

Qwen2-0.5B-Reward-Math-Sheperd

the model repeating the answer.

by Imran1 - opened Dec 26, 2024

I think model need to train at least for 1 epoch. anyhow, great work.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment