Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up
hppdqdq 's Collections
Benchmarks

Benchmarks

updated Jan 13, 2025
Upvote
-

  • Running on CPU Upgrade
    Agents
    251

    MMLU-Pro Leaderboard

    πŸ₯‡
    251

    More advanced and challenging multi-task evaluation


  • Running
    62

    Stick To Your Role! Leaderboard

    🎭
    62

    Benchmarking LLMs on the stability of simulated populations


  • Running
    53

    ZeroEval Leaderboard

    πŸ“Š
    53

    Explore ZeroEval embedding benchmark online


  • Runtime error
    Agents
    26

    Decentralized Arena Leaderboard

    πŸ₯‡
    26

    View and compare LLM evaluations across various domains


  • Runtime error
    Agents
    Featured
    437

    Open Medical-LLM Leaderboard

    πŸ₯‡
    437

    Explore and submit models for benchmarking


  • Paused
    Agents
    354

    GPU Poor LLM Arena

    πŸ†
    354

    Compact LLM Battle Arena: Frugal AI Face-Off!


  • Running
    Agents
    Featured
    135

    Open VLM Video Leaderboard

    🌎
    135

    VLMEvalKit Eval Results in video understanding benchmark


  • Running on CPU Upgrade
    14k

    Open LLM Leaderboard

    πŸ†
    14k

    Track, rank and evaluate open LLMs and chatbots


  • Running
    Agents
    486

    TTS Spaces Arena

    πŸ€—
    486

    Blind vote on HF TTS models!

Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs