fix: update training script with seed variation, fix reward normalization, regenerate training curves showing 0.52->0.67 improvement bdc9954 adityss commited on 26 days ago
feat: commit training evidence, update README with real scores, add demo scripts 8204dc0 adityss commited on 26 days ago