Yale University

university

Verified

https://www.yale.edu/

AI & ML interests

None defined yet.

Recent Activity

ngocbh submitted a paper 14 minutes ago

Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction

yilunzhao authored a paper 5 days ago

ANCHOR: Branch-Point Data Generation for GUI Agents

yilunzhao authored a paper 5 days ago

P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains

View all activity

Papers

Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction

Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems

View all Papers

submitted a paper to Daily Papers 14 minutes ago

Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction

Paper • 2605.09649 • Published 2 days ago • 3

submitted a paper to Daily Papers 5 days ago

Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems

Paper • 2605.04018 • Published 7 days ago • 35

authored a paper about 2 months ago

Can LLM Agents Be CFOs? A Benchmark for Resource Allocation in Dynamic Enterprise Environments

Paper • 2603.23638 • Published Mar 24 • 11

submitted a paper to Daily Papers about 2 months ago

Can LLM Agents Be CFOs? A Benchmark for Resource Allocation in Dynamic Enterprise Environments

Paper • 2603.23638 • Published Mar 24 • 11

submitted a paper to Daily Papers 2 months ago

Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training

Paper • 2603.12246 • Published Mar 12 • 5

authored a paper 2 months ago

RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation

Paper • 2603.09723 • Published Mar 10 • 7

published a Space 2 months ago

README

submitted 2 papers to Daily Papers 2 months ago

ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evolution

Paper • 2603.02510 • Published Mar 3 • 3

QEDBENCH: Quantifying the Alignment Gap in Automated Evaluation of University-Level Mathematical Proofs

Paper • 2602.20629 • Published Feb 24 • 5

authored a paper 3 months ago

Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation

Paper • 2602.16990 • Published Feb 19 • 11

submitted a paper to Daily Papers 3 months ago

Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation

Paper • 2602.16990 • Published Feb 19 • 11

authored a paper 3 months ago

References Improve LLM Alignment in Non-Verifiable Domains

Paper • 2602.16802 • Published Feb 18 • 2

submitted a paper to Daily Papers 3 months ago

References Improve LLM Alignment in Non-Verifiable Domains

Paper • 2602.16802 • Published Feb 18 • 2

submitted a paper to Daily Papers 3 months ago

ResearchGym: Evaluating Language Model Agents on Real-World AI Research

Paper • 2602.15112 • Published Feb 16 • 21

authored a paper 4 months ago

The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models

Paper • 2601.03425 • Published Jan 6 • 17

submitted a paper to Daily Papers 4 months ago

The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models

Paper • 2601.03425 • Published Jan 6 • 17

submitted a paper to Daily Papers 5 months ago

Multi-LLM Thematic Analysis with Dual Reliability Metrics: Combining Cohen's Kappa and Semantic Similarity for Qualitative Research Validation

Paper • 2512.20352 • Published Dec 23, 2025 • 3

authored a paper 7 months ago

FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs

Paper • 2510.08886 • Published Oct 10, 2025 • 20

authored a paper 11 months ago

MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation

Paper • 2506.14028 • Published Jun 16, 2025 • 94

authored a paper 12 months ago

FinTagging: An LLM-ready Benchmark for Extracting and Structuring Financial Information

Paper • 2505.20650 • Published May 27, 2025 • 17