Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
tuandunghcmut 's Collections
Document Layout Analysis Dataset
Agentic Benchmarks
Post-training Dataset
RL-Papers
MT-LLM
Visual Chain-of-Thought Reasoning Benchmarks
LLM for Security Benchmarks/Datasets
Visual-CoT/GCoT related
Text Embedding Papers
EMPTY A
Quantized versions of LLMs/MLLMs
Multilingual Sentiment Analysis Dataset
LLM Series
LLM/MLLM (20B - 80B, fit on 1-2 A100/H100)
SLM
MLLM (100B - 300B)
Benchmarks for evaluating LLMs/MLLMs
Conversation Dataset
Multilingual Parallel Text Corpus
Multilingual Pretraining Corpus for Southeast Asian Language

Agentic Benchmarks

updated 9 days ago
Upvote
-

  • OpenResearcher/OpenResearcher-Dataset

    Viewer • Updated 23 days ago • 97.6k • 16.6k • 107

    Note For Deep Reasearch Agent


  • gaia-benchmark/GAIA

    Viewer • Updated Oct 28, 2025 • 932 • 18.4k • 618

  • vaskarnath/toolcomp

    Viewer • Updated Aug 21, 2025 • 493 • 37 • 1

    Note ToolComp of ScaleAI


  • vaskarnath/toolcomp_process_supervision_eval

    Viewer • Updated Aug 21, 2025 • 1.72k • 52 • 2

  • ToolComp: A Multi-Tool Reasoning & Process Supervision Benchmark

    Paper • 2501.01290 • Published Jan 2, 2025 • 1
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs