docs: add training baseline, reward, and loss curve plots for multiple experimental runs 28abef0 adityss commited on 26 days ago