AI Co-Scientist

classroom

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

CohenQu updated a dataset 11 days ago

ACSci/v3-eval-judge-gpt-oss-20b

CohenQu published a dataset 12 days ago

ACSci/v3-eval-judge-gpt-oss-20b

mkhalifa submitted a paper 3 months ago

Gaming the Judge: Unfaithful Chain-of-Thought Can Undermine Agent Evaluation

View all activity

CohenQu

updated a dataset 11 days ago

ACSci/v3-eval-judge-gpt-oss-20b

Viewer • Updated 11 days ago • 71.2k • 365

CohenQu

published a dataset 12 days ago

ACSci/v3-eval-judge-gpt-oss-20b

Viewer • Updated 11 days ago • 71.2k • 365

mkhalifa

submitted a paper to Daily Papers 3 months ago

Gaming the Judge: Unfaithful Chain-of-Thought Can Undermine Agent Evaluation

Paper • 2601.14691 • Published Jan 21 • 1

Ximing

submitted a paper to Daily Papers 3 months ago

Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text

Paper • 2601.22975 • Published Jan 30 • 111

Ximing

authored a paper 4 months ago

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Paper • 2601.05242 • Published Jan 8 • 231

mkhalifa

authored a paper 11 months ago

Process Reward Models That Think

Paper • 2504.16828 • Published Apr 23, 2025 • 19

CohenQu

authored 3 papers about 1 year ago

Recursive Introspection: Teaching Language Model Agents How to Self-Improve

Paper • 2407.18219 • Published Jul 25, 2024 • 3

Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning

Paper • 2310.18247 • Published Oct 27, 2023

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Paper • 2503.07572 • Published Mar 10, 2025 • 48

aviralku

authored 11 papers about 1 year ago

Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models

Paper • 2310.10639 • Published Oct 16, 2023 • 3

Vision-Language Models Provide Promptable Representations for Reinforcement Learning

Paper • 2402.02651 • Published Feb 5, 2024

RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold

Paper • 2406.14532 • Published Jun 20, 2024

Recursive Introspection: Teaching Language Model Agents How to Self-Improve

Paper • 2407.18219 • Published Jul 25, 2024 • 3

Generative Verifiers: Reward Modeling as Next-Token Prediction

Paper • 2408.15240 • Published Aug 27, 2024 • 13

Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning

Paper • 2410.08146 • Published Oct 10, 2024 • 1

Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance

Paper • 2410.13816 • Published Oct 17, 2024 • 1

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

Paper • 2412.07762 • Published Dec 10, 2024

AI & ML interests

Recent Activity

Team members 5

ACSci's activity