Austin Xu's picture

Austin Xu

austinxu87

·

AI & ML interests

None yet

Recent Activity

upvoted a paper about 2 months ago

MAS-Orchestra: Understanding and Improving Multi-Agent Reasoning Through Holistic Orchestration and Controlled Benchmarks

upvoted a collection 5 months ago

updated a collection 5 months ago

View all activity

Organizations

upvoted a paper about 2 months ago

MAS-Orchestra: Understanding and Improving Multi-Agent Reasoning Through Holistic Orchestration and Controlled Benchmarks

Paper • 2601.14652 • Published Jan 21 • 4

upvoted a collection 5 months ago

FARE

FARE are Salesforce AI Research's open multi-task evaluator models. • 3 items • Updated 11 days ago • 2

updated a collection 5 months ago

FARE

FARE are Salesforce AI Research's open multi-task evaluator models. • 3 items • Updated 11 days ago • 2

upvoted a paper 5 months ago

Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains

Paper • 2510.17793 • Published Oct 20, 2025 • 4

commented a paper 5 months ago

Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains

Paper • 2510.17793 • Published Oct 20, 2025 • 4 •

updated 2 models 5 months ago

Salesforce/FARE-8B

8B • Updated Oct 21, 2025 • 3

Salesforce/FARE-20B

4.76M • Updated Oct 21, 2025 • 3

published 2 models 5 months ago

Salesforce/FARE-20B

4.76M • Updated Oct 21, 2025 • 3

Salesforce/FARE-8B

8B • Updated Oct 21, 2025 • 3

updated a collection 5 months ago

FARE

FARE are Salesforce AI Research's open multi-task evaluator models. • 3 items • Updated 11 days ago • 2

authored a paper 5 months ago

Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math

Paper • 2510.13744 • Published Oct 15, 2025 • 6

upvoted a paper 5 months ago

LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild

Paper • 2510.14240 • Published Oct 16, 2025 • 13

New activity in Salesforce/Hard2Verify 5 months ago

Update task category to `question-answering`, refine `sample_usage`, add tags, and fix typo

#2 opened 5 months ago by

liked a dataset 5 months ago

Salesforce/Hard2Verify

Viewer • Updated Oct 17, 2025 • 200 • 116 • 7

upvoted a paper 5 months ago

Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math

Paper • 2510.13744 • Published Oct 15, 2025 • 6

published a dataset 5 months ago

Salesforce/Hard2Verify

Viewer • Updated Oct 17, 2025 • 200 • 116 • 7

updated a dataset 5 months ago

Salesforce/Hard2Verify

Viewer • Updated Oct 17, 2025 • 200 • 116 • 7

updated a collection 5 months ago

FARE

FARE are Salesforce AI Research's open multi-task evaluator models. • 3 items • Updated 11 days ago • 2

authored a paper 6 months ago

SFR-RAG: Towards Contextually Faithful LLMs

Paper • 2409.09916 • Published Sep 16, 2024 • 1