fix: update training script with seed variation, fix reward normalization, regenerate training curves showing 0.52->0.67 improvement bdc9954 adityss commited on 24 days ago
feat: add baseline evaluation tools and demo scripts for RL performance comparison c395f6a adityss commited on 24 days ago