README / README.md
metaphi-ai's picture
Update README.md
db02725 verified
metadata
title: README
emoji: πŸŒ–
colorFrom: pink
colorTo: blue
sdk: static
pinned: false

Metaphi, Inc

We introduce, CREW, Cross function Enterprise Work Index, to evaluate frontier AI models on long-horizon enterprise tasks.

CREW-Agents

Agent Occupation Complexity Scale What It Tests Verifiers
Fin Agent Credit analyst 32+ expert hours 2,610 tasks, 26K+ PDFs Multiple document reasoning β†’ taxonomy aware transaction categorization β†’ Business P&L construction Programmatic: Binary pass/fail
Enterprise Knowledge Agent Senior business analyst 16+ expert hours 1,220 pitch-deck tasks, 45 video tasks, 279 preference pairs Source faitfhulness β†’ narrative arc based story-telling --> design coherenece Skill-based rubrics and Preference-pairs
Front-end Agent Senior Frontend engineer 60-100 expert hours 37 tasks, 147 expert preferences Figma environment navigation β†’ design system creation β†’ build verification Skill-based rubrics and Preference-pairs

Leaderboard

Results at evals.metaphi.ai/crew/leaderboard

About

Metaphi is an applied AI research lab founded on the mission of scale out of RL environments for long-horizon agents.

We partner with the world's leading domain experts in curating our environments, and training in-house reward models for programmatic verification of autonomous agents.

Website: metaphi.ai