Datasets with reasoning traces for math and code (Train + Eval)
Maojia Song
OrangeEye
AI & ML interests
None yet
Recent Activity
upvoted a paper about 10 hours ago
Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents upvoted a paper 8 days ago
EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments upvoted a paper 8 days ago
Agents' Last Exam