Spaces:
Running
Running
Commit History
Upload from GitHub Actions: Add auto-translated datasets 68a93b5 verified
Upload from GitHub Actions: Merge pull request #18 from datenlabor-bmz/pr-17 a0d1624 verified
Upload from GitHub Actions: Add auto-translated datasets c790fdb verified
Upload from GitHub Actions: minor chashing change b39df3c verified
Upload from GitHub Actions: updated and cleaned up scripts for new eval runs 963cb78 verified
Upload from GitHub Actions: Update models.py, models.json, and results.json with latest evaluation data and model additions 8eebb41 verified
Upload from GitHub Actions: updated translation functions 8f5ce26 verified
Upload from GitHub Actions: Merge pull request #13 from datenlabor-bmz/jn-dev 80d21cb verified
Upload from GitHub Actions: updated batch size and delay 02f927b verified
Upload from GitHub Actions: updated workflow settings e51c770 verified
Upload from GitHub Actions: Merge pull request #9 from datenlabor-bmz/jn-dev 7c06aef verified
Upload from GitHub Actions: Merge pull request #7 from datenlabor-bmz/jn-dev 6878a71 verified
Upload from GitHub Actions: Get more results, compute average based on all tasks 98c6811 verified
Upload from GitHub Actions: Correlation plot b0aa389 verified
Upload from GitHub Actions: Evaluate on autotranslated GSM dataset f3a09a2 verified
Upload from GitHub Actions: Evaluate Google Translate 338dc9b verified
Upload from GitHub Actions: More models and languages a73f888 verified
Upload from GitHub Actions: Merge remote changes and apply terminology updates: Commercial->closed-source, Open->open-source ebaf279 verified
Upload from GitHub Actions: Eavaluate on 40 languages 941d5c5 verified
Upload from GitHub Actions: More results 52abc5b verified
Upload from GitHub Actions: Update model ranking fetching f840423 verified
Upload from GitHub Actions: Use FLORES+ via Huggingface 913253a verified
Upload from GitHub Actions: More models 0bd935e verified
Upload from GitHub Actions: New results b311dd5 verified
Upload from GitHub Actions: Try moving `cache` calls that cause CI issues bc4afa0 verified
Upload from GitHub Actions: Exclude free models from evals c9e9db6 verified
Block gemini-2.5-pro-exp-03-25 092c06a
David Pomerenke commited on
Use most popular current + historical models 9983b5f
David Pomerenke commited on
Run on 40 languages, additional models 260c1a3
David Pomerenke commited on
Fix: don't cache model metadata forever c29b8da
David Pomerenke commited on
Update models 8941a67
David Pomerenke commited on
Add Global MMLU benchmark ce2acb0
David Pomerenke commited on
Translation both from and to 731eddd
David Pomerenke commited on
Get popular models from OpenRouter a32a92f
David Pomerenke commited on
Add OpenRouter metadata to models 9002fc2
David Pomerenke commited on
Run on 100 languages, adjust display 8274634
David Pomerenke commited on
Language selection checkboxes & filtering in backend d91b022
David Pomerenke commited on
Add OpenGPT-X 43057f8
David Pomerenke commited on
More models c5278dd
David Pomerenke commited on
Nicer model table with type and size and filters and colourful score bars 9dbdcb2
David Pomerenke commited on
Params and license metadata from HF API 3ed02d5
David Pomerenke commited on
Refactor eval code into files da6e1bc
David Pomerenke commited on