feat: add baseline evaluation tools and demo scripts for RL performance comparison c395f6a adityss commited on 27 days ago