7 7

Hao Jiang

Lutalica

https://rewindl.github.io/

RewindL

AI & ML interests

Multimodal LLMs, LLM Reasoning, Reinforcement Learning, Efficient Inference

Recent Activity

authored a paper 25 days ago

Recovering Policy-Induced Errors: Benchmarking and Trajectory Synthesis for Robust GUI Agents

upvoted a paper 26 days ago

Pyramid Texture Filtering

authored a paper 29 days ago

D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use

View all activity

Organizations

authored a paper 25 days ago

Recovering Policy-Induced Errors: Benchmarking and Trajectory Synthesis for Robust GUI Agents

Paper • 2605.29447 • Published about 1 month ago • 21

upvoted a paper 26 days ago

Pyramid Texture Filtering

Paper • 2305.06525 • Published May 11, 2023 • 1

authored 3 papers 29 days ago

D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use

Paper • 2602.02160 • Published Feb 2 • 14

Long Live The Balance: Information Bottleneck Driven Tree-based Policy Optimization

Paper • 2605.28109 • Published May 27 • 23

Pyramid Texture Filtering

Paper • 2305.06525 • Published May 11, 2023 • 1

upvoted a paper 30 days ago

Recovering Policy-Induced Errors: Benchmarking and Trajectory Synthesis for Robust GUI Agents

Paper • 2605.29447 • Published about 1 month ago • 21

upvoted a paper about 1 month ago

Long Live The Balance: Information Bottleneck Driven Tree-based Policy Optimization

Paper • 2605.28109 • Published May 27 • 23

commented 2 papers 4 months ago

ReMix: Reinforcement routing for mixtures of LoRAs in LLM finetuning

Paper • 2603.10160 • Published Mar 10 • 26 •

MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models

Paper • 2603.04800 • Published Mar 5 • 25 •

upvoted a paper 4 months ago

MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models

Paper • 2603.04800 • Published Mar 5 • 25

commented a paper 5 months ago

D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use

Paper • 2602.02160 • Published Feb 2 • 14 •

upvoted a paper 5 months ago

D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use

Paper • 2602.02160 • Published Feb 2 • 14

commented a paper 9 months ago

One-Token Rollout: Guiding Supervised Fine-Tuning of LLMs with Policy Gradient

Paper • 2509.26313 • Published Sep 30, 2025 • 5 •

upvoted a paper 12 months ago

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

Paper • 2507.10532 • Published Jul 14, 2025 • 90

commented a paper about 1 year ago

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Paper • 2504.20571 • Published Apr 29, 2025 • 99 •

upvoted a paper about 1 year ago

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18, 2025 • 141

commented a paper about 1 year ago

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18, 2025 • 141 •

New activity in monology/pile-uncopyrighted almost 2 years ago

Format issue when loading dataset

#1 opened over 2 years ago by

antoine314

Hao Jiang

AI & ML interests

Recent Activity

Organizations

Lutalica's activity

Format issue when loading dataset