Yifan Mai's picture

4

Yifan Mai

yifanmai

·

yifanmai

AI & ML interests

None yet

Recent Activity

new activity 7 days ago

evaleval/EEE_datastore:Update HELM to schema version v0.2.2

updated a dataset 7 days ago

evaleval/EEE_datastore

updated a dataset 8 days ago

stanford-crfm/arabic-enterprise

View all activity

Organizations

authored 9 papers 17 days ago

VHELM: A Holistic Evaluation of Vision Language Models

Paper • 2410.07112 • Published Oct 9, 2024 • 3

AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies

Paper • 2407.17436 • Published Jul 11, 2024

Image2Struct: Benchmarking Structure Extraction for Vision-Language Models

Paper • 2410.22456 • Published Oct 29, 2024

SEA-HELM: Southeast Asian Holistic Evaluation of Language Models

Paper • 2502.14301 • Published Feb 20, 2025 • 3

AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons

Paper • 2503.05731 • Published Feb 19, 2025 • 3

Judging LLMs on a Simplex

Paper • 2505.21972 • Published May 28, 2025 • 1

AHELM: A Holistic Evaluation of Audio-Language Models

Paper • 2508.21376 • Published Aug 29, 2025 • 9

Structured Prompting Enables More Robust Evaluation of Language Models

Paper • 2511.20836 • Published Nov 25, 2025

Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation

Paper • 2510.11977 • Published Oct 13, 2025

authored 2 papers about 2 years ago

Holistic Evaluation of Language Models

Paper • 2211.09110 • Published Nov 16, 2022 • 1

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Paper • 2404.12241 • Published Apr 18, 2024 • 13

authored a paper over 2 years ago

Holistic Evaluation of Text-To-Image Models

Paper • 2311.04287 • Published Nov 7, 2023 • 15