Spaces:
Running
Running
Commit History
Upload from GitHub Actions: Improve UX and style 53d2039 verified
Upload from GitHub Actions: Merge remote changes and apply terminology updates: Commercial->closed-source, Open->open-source ebaf279 verified
Upload from GitHub Actions: Use task subset for average score b1e5b40 verified
Upload from GitHub Actions: Eavaluate on 40 languages 941d5c5 verified
Upload from GitHub Actions: Add math benchmarks 549360a verified
Upload from GitHub Actions: More results 52abc5b verified
Upload from GitHub Actions: Update model ranking fetching f840423 verified
Upload from GitHub Actions: Use FLORES+ via Huggingface 913253a verified
Upload from GitHub Actions: Quick fixes 9c2c019 verified
Upload from GitHub Actions: More models 0bd935e verified
Upload from GitHub Actions: Increase n_models d09b095 verified
Upload from GitHub Actions: New results b311dd5 verified
Upload from GitHub Actions: Merge pull request #4 from datenlabor-bmz/jonas-dev 7c6a118 verified
Upload from GitHub Actions: Fix vibecoding 75010c2 verified
Upload from GitHub Actions: Ugly fix for CI errors adc94d7 verified
Upload from GitHub Actions: Try moving `cache` calls that cause CI issues bc4afa0 verified
Upload from GitHub Actions: Exclude free models from evals c9e9db6 verified
Upload from GitHub Actions: Display N/A scores as such 1e8952a verified
Block gemini-2.5-pro-exp-03-25 092c06a
David Pomerenke commited on
Pass through kwargs 5fa433f
David Pomerenke commited on
Fix dataset loading c990cb9
David Pomerenke commited on
Temporarily disable classification task a48ff53
David Pomerenke commited on
Fix path and dev group declaration 1614427
David Pomerenke commited on
Fix import paths c567aee
David Pomerenke commited on
added download function and edited INFO f529b7b
Use most popular current + historical models 9983b5f
David Pomerenke commited on
Only run tasks for which there is no result yet 2f9dee1
David Pomerenke commited on
Run on 40 languages, additional models 260c1a3
David Pomerenke commited on
Shorter classification prompt + error handling 0384b92
David Pomerenke commited on
Move functions for sharing them 55406ba
David Pomerenke commited on
Fix response when no evals data is available 32d50b0
David Pomerenke commited on
Fix: don't cache model metadata forever c29b8da
David Pomerenke commited on
Run on 15 languages f8a3dad
David Pomerenke commited on
Update models 8941a67
David Pomerenke commited on
Implement MMLU task a683732
David Pomerenke commited on
MMLU data loader for 3 parallel datasets 47170a5
David Pomerenke commited on
Analyze MMLU datasets 031925d
David Pomerenke commited on
Add Global MMLU benchmark ce2acb0
David Pomerenke commited on
Translation both from and to 731eddd
David Pomerenke commited on
Get popular models from OpenRouter a32a92f
David Pomerenke commited on
Add OpenRouter metadata to models 9002fc2
David Pomerenke commited on
Run on 100 languages, adjust display 8274634
David Pomerenke commited on
Add Dockerfile 4d13673
David Pomerenke commited on
Fix world map and apply filters for it 92d8154
David Pomerenke commited on
Fix and refactor backend filtering eb1696c
David Pomerenke commited on
Speed things up 566c57e
David Pomerenke commited on
Language selection checkboxes & filtering in backend d91b022
David Pomerenke commited on
Basic backend setup with FastApi but without actual filtering 2c21cf7
David Pomerenke commited on
Add OpenGPT-X 43057f8
David Pomerenke commited on