Spaces:

metaphilabs
/

README

Running

App Files Files Community

README / README.md

metaphi-ai

Update README.md

db02725 verified 23 days ago

preview code

raw

history blame contribute delete

2.25 kB

metadata

title: README
emoji: 🌖
colorFrom: pink
colorTo: blue
sdk: static
pinned: false

Metaphi, Inc

We introduce, CREW, Cross function Enterprise Work Index, to evaluate frontier AI models on long-horizon enterprise tasks.

CREW-Agents

Agent	Occupation	Complexity	Scale	What It Tests	Verifiers
Fin Agent	Credit analyst	32+ expert hours	2,610 tasks, 26K+ PDFs	Multiple document reasoning → taxonomy aware transaction categorization → Business P&L construction	Programmatic: Binary pass/fail
Enterprise Knowledge Agent	Senior business analyst	16+ expert hours	1,220 pitch-deck tasks, 45 video tasks, 279 preference pairs	Source faitfhulness → narrative arc based story-telling --> design coherenece	Skill-based rubrics and Preference-pairs
Front-end Agent	Senior Frontend engineer	60-100 expert hours	37 tasks, 147 expert preferences	Figma environment navigation → design system creation → build verification	Skill-based rubrics and Preference-pairs

Leaderboard

Results at evals.metaphi.ai/crew/leaderboard

About

Metaphi is an applied AI research lab founded on the mission of scale out of RL environments for long-horizon agents.

We partner with the world's leading domain experts in curating our environments, and training in-house reward models for programmatic verification of autonomous agents.

Website: metaphi.ai