AI & ML interests

None defined yet.

Recent Activity

metaphi-ai  updated a Space 4 days ago
metaphilabs/README
metaphi-ai  updated a collection 4 days ago
FinAgent
metaphi-ai  published a dataset 5 days ago
metaphilabs/credit-underwriting-preview
View all activity

Organization Card

Metaphi, Inc

We introduce, CREW, Cross function Enterprise Work Index, to evaluate frontier AI models on long-horizon enterprise tasks.

CREW-Agents

Agent Occupation Complexity Scale What It Tests Verifiers
Fin Agent Credit analyst 32+ expert hours 2,610 tasks, 26K+ PDFs Multiple document reasoning → taxonomy aware transaction categorization → Business P&L construction Programmatic: Binary pass/fail
Enterprise Knowledge Agent Senior business analyst 16+ expert hours 1,220 pitch-deck tasks, 45 video tasks, 279 preference pairs Source faitfhulness → narrative arc based story-telling --> design coherenece Skill-based rubrics and Preference-pairs
Front-end Agent Senior Frontend engineer 60-100 expert hours 37 tasks, 147 expert preferences Figma environment navigation → design system creation → build verification Skill-based rubrics and Preference-pairs

Leaderboard

Results at evals.metaphi.ai/crew/leaderboard

About

Metaphi is an applied AI research lab founded on the mission of scale out of RL environments for long-horizon agents.

We partner with the world's leading domain experts in curating our environments, and training in-house reward models for programmatic verification of autonomous agents.

Website: metaphi.ai

models 0

None public yet