feat: add baseline evaluation tools and demo scripts for RL performance comparison c395f6a adityss commited on 24 days ago