Stanford CongLab

community

https://conglab.com/

Stanford-CongLab

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

black-yt updated a dataset about 2 months ago

Stanford-CongLab/LabHorizon-Protocol-Conditioned-Planning

black-yt updated a model about 2 months ago

Stanford-CongLab/LabHorizon-Model

black-yt updated a dataset about 2 months ago

Stanford-CongLab/LabHorizon-3D-Asset-Perception

View all activity

black-yt

updated a dataset about 2 months ago

Stanford-CongLab/LabHorizon-Protocol-Conditioned-Planning

Viewer • Updated Jun 13 • 3.2k • 113 • 1

black-yt

updated a model about 2 months ago

Stanford-CongLab/LabHorizon-Model

Image-Text-to-Text • Updated Jun 13 • 8 • 1

black-yt

updated a dataset about 2 months ago

Stanford-CongLab/LabHorizon-3D-Asset-Perception

Viewer • Updated Jun 13 • 3.2k • 123 • 1

black-yt

authored a paper about 2 months ago

ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research

Paper • 2606.07591 • Published May 28 • 102

black-yt

posted an update about 2 months ago

Post

4880

Hey all — our ResearchClawBench leaderboard just updated 🔥

We let AI do real science: 40 tasks across 10 disciplines, compared to human papers. Hard example? 🏔️ Glacier mass change — AI must integrate 233 datasets from 35 teams, 4 methods, reproduce 6542±387 Gt ice loss vs IPCC. No toy problems.

Latest leaderboard (2026-06-09) 📊:
Agents: 🥇 Claude Code 21.5 (50 = match human), $5.3; 🥈 EvoScientist 18.8, $4.1; 🥉 Codex CLI 18.4, just $2.0
LLMs+Harness: 🥇 Claude-Opus-4.8 21.1, $4.0; 🥈 Claude-Opus-4.7 20.7; 🥉 MiniMax-M3 19.8, only $0.45; Qwen3.7-Max 18.7, $0.42, 11min 💥

Claude still king, but MiniMax/Qwen/DeepSeek are crazy cheap and competitive. Expensive isn't always better.

📎 Code & star: https://github.com/InternScience/ResearchClawBench
🏠 Website: https://internscience.github.io/ResearchClawBench-Home/
🤗 Upvote paper: ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research (2606.07591)