Enterprise Figma files to production grade react code evaluated using pair wise preference alignment and test-time ground truth using VLMs.
AI & ML interests
None defined yet.
Recent Activity
Organization Card
Metaphi, Inc
We introduce, CREW, Cross function Enterprise Work Index, to evaluate frontier AI models on long-horizon enterprise tasks.
CREW-Agents
| Agent | Occupation | Complexity | Scale | What It Tests | Verifiers |
|---|---|---|---|---|---|
| Fin Agent | Credit analyst | 32+ expert hours | 2,610 tasks, 26K+ PDFs | Multiple document reasoning → taxonomy aware transaction categorization → Business P&L construction | Programmatic: Binary pass/fail |
| Enterprise Knowledge Agent | Senior business analyst | 16+ expert hours | 1,220 pitch-deck tasks, 45 video tasks, 279 preference pairs | Source faitfhulness → narrative arc based story-telling --> design coherenece | Skill-based rubrics and Preference-pairs |
| Front-end Agent | Senior Frontend engineer | 60-100 expert hours | 37 tasks, 147 expert preferences | Figma environment navigation → design system creation → build verification | Skill-based rubrics and Preference-pairs |
Leaderboard
Results at evals.metaphi.ai/crew/leaderboard
About
Metaphi is an applied AI research lab founded on the mission of scale out of RL environments for long-horizon agents.
We partner with the world's leading domain experts in curating our environments, and training in-house reward models for programmatic verification of autonomous agents.
Website: metaphi.ai
models 0
None public yet
datasets 6
metaphilabs/credit-underwriting-preview
Viewer • Updated • 38 • 65
metaphilabs/frontend-figma-to-code
Viewer • Updated • 37 • 38
metaphilabs/credit-underwriting-commercial
Viewer • Updated • 504 • 12
metaphilabs/remotion-video-preferences
Viewer • Updated • 279 • 4
metaphilabs/figma2code-expert-preferences
Viewer • Updated • 147 • 6
metaphilabs/remotion-video-gen
Viewer • Updated • 45 • 5