Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up

Spaces:
evaleval
/
general-eval-card
Running

App Files Files Community
5
Fetching metadata from the HF Docker repository...
general-eval-card / metadata
Ctrl+K
Ctrl+K
  • 7 contributors
History: 2 commits
evijit's picture
evijit HF Staff
Separate policy and researcher views
9b4cdbb 21 days ago
  • benchmark_card_BoolQ.json
    7.18 kB
    fix bugs about 1 month ago
  • benchmark_card_CNN_DailyMail.json
    6.32 kB
    fix bugs about 1 month ago
  • benchmark_card_CivilComments.json
    7.62 kB
    fix bugs about 1 month ago
  • benchmark_card_GPQA.json
    8.62 kB
    fix bugs about 1 month ago
  • benchmark_card_GSM8K.json
    7.01 kB
    fix bugs about 1 month ago
  • benchmark_card_HellaSwag.json
    7.6 kB
    fix bugs about 1 month ago
  • benchmark_card_IFEval.json
    6.4 kB
    fix bugs about 1 month ago
  • benchmark_card_LegalBench.json
    7.06 kB
    fix bugs about 1 month ago
  • benchmark_card_MATH_Level_5.json
    6.35 kB
    fix bugs about 1 month ago
  • benchmark_card_MMLU-Pro.json
    7.77 kB
    fix bugs about 1 month ago
  • benchmark_card_MMLU.json
    6.33 kB
    fix bugs about 1 month ago
  • benchmark_card_MUSR.json
    8.3 kB
    fix bugs about 1 month ago
  • benchmark_card_MedQA.json
    6.6 kB
    fix bugs about 1 month ago
  • benchmark_card_Omni-MATH.json
    7.03 kB
    fix bugs about 1 month ago
  • benchmark_card_QuAC.json
    7.72 kB
    fix bugs about 1 month ago
  • benchmark_card_WildBench.json
    8.04 kB
    fix bugs about 1 month ago
  • benchmark_known_issues.json
    4.06 kB
    Separate policy and researcher views 21 days ago