Human-Centered Eval

community

Activity Feed

AI & ML interests

Model evaluation, Benchmark analysis, Generative language models, Measurement theories

Recent Activity

Salomeee updated a dataset 23 days ago

human-centered-eval/OpenEval

Salomeee updated a dataset 23 days ago

human-centered-eval/OpenEval

Salomeee published a dataset 3 months ago

human-centered-eval/OpenEval

View all activity

Salomeee

updated a dataset 23 days ago

human-centered-eval/OpenEval

Viewer • Updated 22 days ago • 20.3M • 845 • 6

Salomeee

published a dataset 3 months ago

human-centered-eval/OpenEval

Viewer • Updated 22 days ago • 20.3M • 845 • 6

Phosphor-Bai

authored 7 papers 4 months ago

OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems

Paper • 2402.14008 • Published Feb 21, 2024 • 1

Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning

Paper • 2505.16483 • Published May 22, 2025 • 10

MiniCPM4: Ultra-Efficient LLMs on End Devices

Paper • 2506.07900 • Published Jun 9, 2025 • 99

GLTW: Joint Improved Graph Transformer and LLM via Three-Word Language for Knowledge Graph Completion

Paper • 2502.11471 • Published Feb 17, 2025 • 2

C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models

Paper • 2305.08322 • Published May 15, 2023

FaithLens: Detecting and Explaining Faithfulness Hallucination

Paper • 2512.20182 • Published Dec 23, 2025 • 9

InFi-Check: Interpretable and Fine-Grained Fact-Checking of LLMs

Paper • 2601.06666 • Published Jan 10 • 1

sangttruong

authored 2 papers 11 months ago

ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code

Paper • 2506.02314 • Published Jun 2, 2025

Reliable and Efficient Amortized Model-based Evaluation

Paper • 2503.13335 • Published Mar 17, 2025

ziangxiao

authored 4 papers almost 2 years ago

Rethinking Model Evaluation as Narrowing the Socio-Technical Gap

Paper • 2306.03100 • Published Jun 1, 2023

ECBD: Evidence-Centered Benchmark Design for NLP

Paper • 2406.08723 • Published Jun 13, 2024

Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models

Paper • 2407.05502 • Published Jul 7, 2024

Improving Context-Aware Preference Modeling for Language Models

Paper • 2407.14916 • Published Jul 20, 2024 • 4

sangttruong

authored a paper about 2 years ago

Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models

Paper • 2403.02715 • Published Mar 5, 2024 • 3

ziangxiao

authored a paper over 2 years ago

Generative Echo Chamber? Effects of LLM-Powered Search Systems on Diverse Information Seeking

Paper • 2402.05880 • Published Feb 8, 2024 • 3

ziangxiao

authored a paper almost 3 years ago

Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference

Paper • 2306.12509 • Published Jun 21, 2023 • 15

sangttruong

authored a paper almost 3 years ago

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Paper • 2306.11698 • Published Jun 20, 2023 • 13

AI & ML interests

Recent Activity

Team members 7

human-centered-eval's activity