reward modeling

classroom

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

tengyangx authored a paper about 1 month ago

Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

tengyangx authored a paper about 1 month ago

Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF

tengyangx authored a paper about 1 month ago

Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts

View all activity

tengyangx

authored 7 papers about 1 month ago

Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Paper • 2404.03715 • Published Apr 4, 2024 • 62

Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF

Paper • 2405.21046 • Published May 31, 2024 • 4

Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts

Paper • 2406.12845 • Published Jun 18, 2024 • 1

Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning

Paper • 2505.15311 • Published May 21, 2025

POLCA: Stochastic Generative Optimization with LLM

Paper • 2603.14769 • Published Mar 16 • 23

Breaking the Capability Ceiling of LLM Post-Training by Reintroducing Markov States

Paper • 2603.19987 • Published Mar 20 • 9

Understanding Behavior Cloning with Action Quantization

Paper • 2603.20538 • Published Mar 20 • 2

tengyangx

submitted 2 papers to Daily Papers about 1 month ago

Understanding Behavior Cloning with Action Quantization

Paper • 2603.20538 • Published Mar 20 • 2

Breaking the Capability Ceiling of LLM Post-Training by Reintroducing Markov States

Paper • 2603.19987 • Published Mar 20 • 9

yifeihe3

authored 6 papers 2 months ago

Semi-Supervised Reward Modeling via Iterative Self-Training

Paper • 2409.06903 • Published Sep 10, 2024 • 1

Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks

Paper • 2410.18210 • Published Oct 23, 2024

authored 3 papers 12 months ago

Fractured Chain-of-Thought Reasoning

Paper • 2505.12992 • Published May 19, 2025 • 23

Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models

Paper • 2505.10554 • Published May 15, 2025 • 119

Scalable Chain of Thoughts via Elastic Reasoning

Paper • 2505.05315 • Published May 8, 2025 • 26

hendrydong

authored 2 papers about 1 year ago

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

Paper • 2505.02391 • Published May 5, 2025 • 25

BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation

Paper • 2502.03860 • Published Feb 6, 2025 • 25

AI & ML interests

Recent Activity

Team members 5

cornfieldrm's activity