Upload from GitHub Actions: blocklist: drop the grace for slow failing models, not just egregious ones 608b646 Running verified davidpomerenke commited on 23 days ago
Upload from GitHub Actions: models: drop kimi-k2.6; exclude egregiously-failing models after one run 4c601bb verified davidpomerenke commited on 23 days ago
Upload from GitHub Actions: eval: check runtime budget per-batch so a slow model can't blow the 6h cap 594d28a verified davidpomerenke commited on 24 days ago
Upload from GitHub Actions: eval: fix per-combo resilience (tqdm_asyncio.gather has no return_exceptions) 18acfdb verified davidpomerenke commited on 24 days ago
Upload from GitHub Actions: eval: don't let one bad (task,language) combo crash the whole run a92221e verified davidpomerenke commited on 24 days ago
Upload from GitHub Actions: models: migrate catalog to /api/v1/models; enforce privacy per-request f502bec verified davidpomerenke commited on 24 days ago
Upload from GitHub Actions: discovery: surface newer flagships from curated families; blocklist: require 2 consecutive bad runs f28fed1 verified davidpomerenke commited on 24 days ago
Upload from GitHub Actions: discovery: one flagship per product line; eval: graceful 6h-safe runtime budget 4047210 verified davidpomerenke commited on 25 days ago
Upload from GitHub Actions: main: gate publishing on coverage-completeness, not just error rate c1041db verified davidpomerenke commited on 28 days ago
Upload from GitHub Actions: util: retry HF push with backoff; write local snapshot before push 2a1f0a5 verified davidpomerenke commited on 28 days ago
Upload from GitHub Actions: main: checkpoint per fully-evaluated model instead of once at the end eaa7534 verified davidpomerenke commited on 29 days ago
Upload from GitHub Actions: models: replace claude-opus-4.5/4.6 with 4.8 in curated list 15e8f68 verified davidpomerenke commited on about 1 month ago
Upload from GitHub Actions: discovery: filter out voice/ASR/vision/build endpoints f2add9e verified davidpomerenke commited on about 1 month ago
Upload from GitHub Actions: backend: handle null creation_date in three apply() calls f6a28ed verified davidpomerenke commited on May 19
Upload from GitHub Actions: fast-fail on account-level API errors; refuse to ship runs with >80% errors c2afc16 verified davidpomerenke commited on May 19
Upload from GitHub Actions: unblock workflow: materialize gcloud creds on runner; lazy-init translate client bec2f46 verified davidpomerenke commited on May 19
Upload from GitHub Actions: guard main.py against partial-scale HF pushes; restore aggregated results 691e6c2 verified davidpomerenke commited on May 19
Upload from GitHub Actions: refresh pyproject metadata + README HF frontmatter 1eccc3f verified davidpomerenke commited on May 18
Upload from GitHub Actions: Merge pull request #28 from datenlabor-bmz/jn-dev 55b63ea verified davidpomerenke commited on Jan 6
Upload from GitHub Actions: add gpt-5.1, gemini-3 9ea2dd3 verified davidpomerenke commited on Nov 30, 2025
Upload from GitHub Actions: flores filter for available dev split 34b05c6 verified davidpomerenke commited on Nov 10, 2025
Upload from GitHub Actions: model name no bracket stuff aa92add verified davidpomerenke commited on Nov 9, 2025
Upload from GitHub Actions: drop normalization 972026c verified davidpomerenke commited on Nov 9, 2025
Upload from GitHub Actions: improve norwegian fix 6f0e312 verified davidpomerenke commited on Nov 9, 2025
Upload from GitHub Actions: Merge pull request #22 from datenlabor-bmz/dev 2cdada4 verified davidpomerenke commited on Oct 27, 2025
Upload from GitHub Actions: Add auto-translated datasets 68a93b5 verified davidpomerenke commited on Sep 20, 2025
Upload from GitHub Actions: Merge pull request #18 from datenlabor-bmz/pr-17 a0d1624 verified davidpomerenke commited on Sep 11, 2025
Upload from GitHub Actions: Add auto-translated datasets c790fdb verified davidpomerenke commited on Sep 1, 2025
Upload from GitHub Actions: ran full evaluation locally 088f96f verified davidpomerenke commited on Aug 30, 2025
Upload from GitHub Actions: minor chashing change b39df3c verified davidpomerenke commited on Aug 29, 2025
Upload from GitHub Actions: updated and cleaned up scripts for new eval runs 963cb78 verified davidpomerenke commited on Aug 29, 2025
Upload from GitHub Actions: Update models.py, models.json, and results.json with latest evaluation data and model additions 8eebb41 verified davidpomerenke commited on Aug 27, 2025
Upload from GitHub Actions: Add Todos for using existing machine-translated datasets rather than our own ones 56adaa2 verified davidpomerenke commited on Aug 14, 2025
Upload from GitHub Actions: updated translation functions 8f5ce26 verified davidpomerenke commited on Aug 13, 2025
Upload from GitHub Actions: import flexibility on backend b8cbeff verified davidpomerenke commited on Aug 13, 2025
Upload from GitHub Actions: fixed import error 0a30811 verified davidpomerenke commited on Aug 13, 2025
Upload from GitHub Actions: updated frontend and backend to fix bugs 4e8cb1a verified davidpomerenke commited on Aug 13, 2025
Upload from GitHub Actions: Merge pull request #13 from datenlabor-bmz/jn-dev 80d21cb verified davidpomerenke commited on Aug 8, 2025
Upload from GitHub Actions: Merge pull request #10 from datenlabor-bmz/jn-dev c2eeeac verified davidpomerenke commited on Aug 5, 2025
Upload from GitHub Actions: updated batch size and delay 02f927b verified davidpomerenke commited on Aug 5, 2025
Upload from GitHub Actions: updated workflow settings e51c770 verified davidpomerenke commited on Aug 5, 2025
Upload from GitHub Actions: Merge pull request #9 from datenlabor-bmz/jn-dev 7c06aef verified davidpomerenke commited on Aug 5, 2025
Upload from GitHub Actions: Merge pull request #7 from datenlabor-bmz/jn-dev 6878a71 verified davidpomerenke commited on Jul 25, 2025
Upload from GitHub Actions: Merge pull request #6 from datenlabor-bmz/jn-dev 6234f5c verified davidpomerenke commited on Jul 24, 2025