Open CoT Leaderboard

community

Activity Feed Request to join this org

AI & ML interests

Chain of Thought, LLM Evaluation

Recent Activity

yakazimir authored a paper 16 days ago

AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite

yakazimir authored a paper 16 days ago

TinyScientist: An Interactive, Extensible, and Controllable Framework for Building Research Agents

yakazimir authored a paper 16 days ago

Probabilistic Programs of Thought

View all activity

authored 4 papers 16 days ago

AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite

Paper • 2510.21652 • Published Oct 24, 2025 • 4

TinyScientist: An Interactive, Extensible, and Controllable Framework for Building Research Agents

Paper • 2510.06579 • Published Oct 8, 2025

Probabilistic Programs of Thought

Paper • 2604.17290 • Published 26 days ago

Analytica: Soft Propositional Reasoning for Robust and Scalable LLM-Driven Analysis

Paper • 2604.23072 • Published 21 days ago

authored a paper 11 months ago

Language Modeling by Language Models

Paper • 2506.20249 • Published Jun 25, 2025 • 1

authored a paper about 1 year ago

ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

Paper • 2502.01100 • Published Feb 3, 2025 • 21

updated 2 datasets about 1 year ago

cot-leaderboard/cot-leaderboard-requests

Preview • Updated Feb 26, 2025 • 34

cot-leaderboard/cot-leaderboard-results

Viewer • Updated Feb 26, 2025 • 133 • 91

in cot-leaderboard/cot-leaderboard-results about 1 year ago

Update leaderboard for model DebateLabKIT/Llama-3.1-Argunaut-1-8B-SPIN

#134 opened over 1 year ago by

Update leaderboard for model DebateLabKIT/Llama-3.3-Argunaut-1-70B-SPIN-dev1

#135 opened about 1 year ago by

Update leaderboard for model deepseek-ai/DeepSeek-R1-Distill-Llama-8B

#137 opened about 1 year ago by

Update leaderboard for model deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

#138 opened about 1 year ago by

Update leaderboard for model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

#139 opened about 1 year ago by

Update leaderboard for model deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

#140 opened about 1 year ago by

Update leaderboard for model deepseek-ai/DeepSeek-R1-Distill-Qwen-14B

#141 opened about 1 year ago by

Update leaderboard for model deepseek-ai/DeepSeek-R1-Distill-Llama-70B

#142 opened about 1 year ago by

in cot-leaderboard/cot-eval-results about 1 year ago

Upload results for model DebateLabKIT/Llama-3.1-Argunaut-1-8B-SPIN

#1047 opened over 1 year ago by

Upload results for model DebateLabKIT/Llama-3.1-Argunaut-1-8B-SPIN

#1048 opened over 1 year ago by

Upload results for model DebateLabKIT/Llama-3.1-Argunaut-1-8B-SPIN

#1049 opened over 1 year ago by

Upload results for model DebateLabKIT/Llama-3.1-Argunaut-1-8B-SPIN

#1050 opened over 1 year ago by