leaderboard / data /experiments

Commit History

chore: update data with agentic results
a83c01d
unverified

tareknaser commited on

feat: add gemini and deepseek results
de324c6

tareknaser commited on

chore: update results
3cb4b1d
unverified

tareknaser commited on

refactor: migrate to integrate with inspect evals
f3d287f
unverified

tareknaser commited on

feat: a simple leaderboard with filtering
7e7fbbc
unverified

tareknaser commited on