A collection of benchmarks for evaluating LMs or VLMs under multi-turn interaction
Young-Jun Lee PRO
passing2961
AI & ML interests
Social Dialogue System, Multi-Modal Dialogue
Recent Activity
upvoted a paper about 6 hours ago
OpenThoughts-Agent: Data Recipes for Agentic Models upvoted a paper about 6 hours ago
NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers? upvoted a paper about 6 hours ago
Qwen-AgentWorld: Language World Models for General Agents