1 27 16

junmingyang

jmyang

https://junming-yang.github.io/

junming-yang

AI & ML interests

LLM Alignment, VLM

Recent Activity

authored a paper 18 days ago

Preference Orchestrator: Prompt-Aware Multi-Objective Alignment for Large Language Models

authored a paper 18 days ago

SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks

upvoted a paper 18 days ago

SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating

View all activity

Organizations

None yet

upvoted 2 papers 18 days ago

SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating

Paper • 2606.07074 • Published 23 days ago • 12

SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks

Paper • 2606.09669 • Published 20 days ago • 46

upvoted a paper 26 days ago

SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

Paper • 2605.29796 • Published about 1 month ago • 25

upvoted a paper about 1 month ago

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents

Paper • 2605.25624 • Published May 25 • 34

upvoted a paper 3 months ago

OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

Paper • 2604.04707 • Published Apr 6 • 204

upvoted a collection 4 months ago

Meta APO

Collection

Model of MetaAPO https://arxiv.org/abs/2509.23371 • 6 items • Updated Feb 28 • 2

upvoted a paper 7 months ago

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

Paper • 2511.19399 • Published Nov 24, 2025 • 63

upvoted 4 papers 9 months ago

VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

Paper • 2407.11691 • Published Jul 16, 2024 • 17

Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization

Paper • 2509.23371 • Published Sep 27, 2025 • 6

Language Models Can Learn from Verbal Feedback Without Scalar Rewards

Paper • 2509.22638 • Published Sep 26, 2025 • 70

UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning

Paper • 2509.11543 • Published Sep 15, 2025 • 50

upvoted a paper 11 months ago

Pixels, Patterns, but No Poetry: To See The World like Humans

Paper • 2507.16863 • Published Jul 21, 2025 • 69

upvoted a paper about 1 year ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14, 2025 • 311

upvoted an article about 1 year ago

Article

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

NormalUhr

•

Feb 11, 2025

• 126

upvoted 3 papers over 1 year ago

upvoted an article almost 2 years ago

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

natolambert, LouisCastricato, lvwerra, Dahoas

•

Dec 9, 2022

• 417

upvoted 2 papers almost 2 years ago

GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI

Paper • 2408.03361 • Published Aug 6, 2024 • 85

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6, 2024 • 61