HAE-RAE

non-profit

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

Cartinoe5930 authored a paper 6 days ago

What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models

seungone authored a paper 6 days ago

Reasoning over mathematical objects: on-policy reward modeling and test time aggregation

Cartinoe5930 authored a paper 6 days ago

KMMMU: Evaluation of Massive Multi-discipline Multimodal Understanding in Korean Language and Context

View all activity

Papers

What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models

View all Papers

authored a paper 6 days ago

What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models

Paper • 2601.06165 • Published Jan 7 • 16

authored a paper 6 days ago

Reasoning over mathematical objects: on-policy reward modeling and test time aggregation

Paper • 2603.18886 • Published Mar 19 • 6

authored a paper 6 days ago

KMMMU: Evaluation of Massive Multi-discipline Multimodal Understanding in Korean Language and Context

Paper • 2604.13058 • Published Mar 18 • 2

authored a paper 6 days ago

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

Paper • 2605.09063 • Published 10 days ago • 77

authored a paper 7 days ago

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

Paper • 2605.09063 • Published 10 days ago • 77

submitted a paper to Daily Papers 7 days ago

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

Paper • 2605.09063 • Published 10 days ago • 77

authored a paper 9 days ago

XL-SafetyBench: A Country-Grounded Cross-Cultural Benchmark for LLM Safety and Cultural Sensitivity

Paper • 2605.05662 • Published 12 days ago • 11

submitted a paper to Daily Papers 11 days ago

XL-SafetyBench: A Country-Grounded Cross-Cultural Benchmark for LLM Safety and Cultural Sensitivity

Paper • 2605.05662 • Published 12 days ago • 11

updated a dataset about 1 month ago

HAERAE-HUB/HAERAE-VISION

Viewer • Updated Apr 18 • 653 • 88 • 13

updated a dataset about 1 month ago

HAERAE-HUB/KMMMU

Viewer • Updated Apr 16 • 3.45k • 368 • 13

submitted a paper to Daily Papers about 2 months ago

Composer 2 Technical Report

Paper • 2603.24477 • Published Mar 25 • 15

updated a dataset about 2 months ago

HAERAE-HUB/KMMMU

Viewer • Updated Apr 16 • 3.45k • 368 • 13

updated a Space 3 months ago

README

authored a paper 3 months ago

Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math

Paper • 2602.06291 • Published Feb 6 • 24

submitted a paper to Daily Papers 3 months ago

Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math

Paper • 2602.06291 • Published Feb 6 • 24

updated a dataset 4 months ago

HAERAE-HUB/Ko-PIQA

Viewer • Updated Jan 13 • 441 • 27 • 3

authored a paper 4 months ago

What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models

Paper • 2601.06165 • Published Jan 7 • 16

submitted a paper to Daily Papers 4 months ago

What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models

Paper • 2601.06165 • Published Jan 7 • 16

published a dataset 4 months ago

HAERAE-HUB/HAERAE-VISION

Viewer • Updated Apr 18 • 653 • 88 • 13

authored a paper 4 months ago

Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures

Paper • 2510.24081 • Published Oct 28, 2025 • 24