Hugo Laurençon's picture

Hugo Laurençon

HugoLaurencon

·

HugoLaurencon

AI & ML interests

None yet

Organizations

upvoted a paper 22 days ago

Trust Region On-Policy Distillation

Paper • 2606.01249 • Published 27 days ago • 44

upvoted a paper 24 days ago

NITP: Next Implicit Token Prediction for LLM Pre-training

Paper • 2605.24956 • Published May 24 • 35

upvoted 3 papers about 1 month ago

Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

Paper • 2605.11609 • Published May 12 • 196

You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories

Paper • 2605.21468 • Published May 20 • 51

Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining

Paper • 2605.14747 • Published May 14 • 147

upvoted 2 papers about 2 months ago

Can Muon Fine-tune Adam-Pretrained Models?

Paper • 2605.10468 • Published May 11 • 6

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

Paper • 2605.09063 • Published May 9 • 82

upvoted 2 papers 4 months ago

Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation

Paper • 2602.12125 • Published Feb 12 • 68

Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments

Paper • 2602.11964 • Published Feb 12 • 13

upvoted 2 papers 5 months ago

Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR

Paper • 2602.05261 • Published Feb 5 • 54

Scaling Embeddings Outperforms Scaling Experts in Language Models

Paper • 2601.21204 • Published Jan 29 • 105

upvoted a paper 6 months ago

Deep Delta Learning

Paper • 2601.00417 • Published Jan 1 • 34

upvoted 2 papers 7 months ago

On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning

Paper • 2505.17508 • Published May 23, 2025 • 8

Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance

Paper • 2511.13254 • Published Nov 17, 2025 • 140

upvoted a paper 8 months ago

ChartM^3: A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension

Paper • 2511.02415 • Published Nov 4, 2025 • 5

upvoted 4 papers 9 months ago

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 517

Quantile Advantage Estimation for Entropy-Safe Reasoning

Paper • 2509.22611 • Published Sep 26, 2025 • 119

ARE: Scaling Up Agent Environments and Evaluations

Paper • 2509.17158 • Published Sep 21, 2025 • 36

Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images

Paper • 2509.07966 • Published Sep 9, 2025 • 5

upvoted a paper 10 months ago

ΔL Normalization: Rethink Loss Aggregation in RLVR

Paper • 2509.07558 • Published Sep 9, 2025 • 7