AMA-Bench

non-profit

https://ama-bench.github.io/

AI & ML interests

None defined yet.

Recent Activity

Snyhlxde submitted a paper about 21 hours ago

JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting

YujieZhao updated a Space 10 days ago

AMA-bench/AMA-bench-Leaderboard

YujieZhao new activity 19 days ago

AMA-bench/AMA-bench:Add question-answering task category to metadata

View all activity

submitted a paper to Daily Papers about 21 hours ago

JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting

Paper • 2606.18394 • Published 2 days ago • 25

updated a Space 10 days ago

AMA-Bench Leaderboard

Explore and compare AI model performance across tasks

in AMA-bench/AMA-bench 19 days ago

Add question-answering task category to metadata

#2 opened about 1 month ago by

authored a paper about 1 month ago

MetaAgent-X : Breaking the Ceiling of Automatic Multi-Agent Systems via End-to-End Reinforcement Learning

Paper • 2605.14212 • Published May 14 • 18

authored a paper about 1 month ago

AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications

Paper • 2602.22769 • Published Feb 26 • 10

authored 4 papers about 2 months ago

aiXiv: A Next-Generation Open Access Ecosystem for Scientific Discovery Generated by AI Scientists

Paper • 2508.15126 • Published Aug 20, 2025 • 20

AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications

Paper • 2602.22769 • Published Feb 26 • 10

L-MARS: Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search

Paper • 2509.00761 • Published Aug 31, 2025

ClawTrace: Cost-Aware Tracing for LLM Agent Skill Distillation

Paper • 2604.23853 • Published Apr 26 • 2

updated a dataset 2 months ago

AMA-bench/AMA-bench

Viewer • Updated 19 days ago • 208 • 369 • 9

authored a paper 3 months ago

EvoClaw: Evaluating AI Agents on Continuous Software Evolution

Paper • 2603.13428 • Published Mar 13 • 21

published a Space 4 months ago

AMA-Bench Leaderboard

Explore and compare AI model performance across tasks

published a dataset 4 months ago

AMA-bench/AMA-bench

Viewer • Updated 19 days ago • 208 • 369 • 9

updated a Space 4 months ago

AMA-Bench Leaderboard

Explore and compare AI model performance across tasks

updated a Space 5 months ago

AMA-Bench Leaderboard

Explore and compare AI model performance across tasks

authored 2 papers 6 months ago

Benchmarking Scientific Understanding and Reasoning for Video Generation using VideoScience-Bench

Paper • 2512.02942 • Published Dec 2, 2025 • 5

Fast and Accurate Causal Parallel Decoding using Jacobi Forcing

Paper • 2512.14681 • Published Dec 16, 2025 • 44

submitted a paper to Daily Papers 6 months ago

Fast and Accurate Causal Parallel Decoding using Jacobi Forcing

Paper • 2512.14681 • Published Dec 16, 2025 • 44

authored a paper 8 months ago

Stronger Together: On-Policy Reinforcement Learning for Collaborative LLMs

Paper • 2510.11062 • Published Oct 13, 2025 • 29

authored a paper about 1 year ago

lmgame-Bench: How Good are LLMs at Playing Games?

Paper • 2505.15146 • Published May 21, 2025 • 20