Zhang's picture

Zhang

WenyaoZhang

·

wy_zhang@sjtu.edu.cn

AI & ML interests

None yet

Recent Activity

authored a paper 10 days ago

[CLS] Token is All You Need for Zero-Shot Semantic Segmentation

authored a paper 10 days ago

OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models

authored a paper 10 days ago

Hybrid-grained Feature Aggregation with Coarse-to-fine Language Guidance for Self-supervised Monocular Depth Estimation

View all activity

Organizations

authored 13 papers 10 days ago

[CLS] Token is All You Need for Zero-Shot Semantic Segmentation

Paper • 2304.06212 • Published Apr 13, 2023

OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models

Paper • 2506.03135 • Published Jun 3, 2025 • 40

Hybrid-grained Feature Aggregation with Coarse-to-fine Language Guidance for Self-supervised Monocular Depth Estimation

Paper • 2510.09320 • Published Oct 10, 2025 • 3

Reasoning in Space via Grounding in the World

Paper • 2510.13800 • Published Oct 15, 2025 • 15

Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion

Paper • 2407.02077 • Published Jul 2, 2024

VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model

Paper • 2602.10098 • Published Feb 10 • 22

ReWorld: Multi-Dimensional Reward Modeling for Embodied World Models

Paper • 2601.12428 • Published Jan 18

AIM: Intent-Aware Unified world action Modeling with Spatial Value Maps

Paper • 2604.11135 • Published Apr 13

Disentangled Robot Learning via Separate Forward and Inverse Dynamics Pretraining

Paper • 2604.16391 • Published Mar 27 • 4

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

Paper • 2606.03985 • Published 27 days ago • 41

LIMMT: Less is More for Motion Tracking

Paper • 2606.06953 • Published 24 days ago • 16

MaskWAM: Unifying Mask Prompting and Prediction for World-Action Models

Paper • 2606.13515 • Published 18 days ago • 2

ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing?

Paper • 2606.19531 • Published 12 days ago • 21

upvoted a paper 20 days ago

AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing

Paper • 2606.09811 • Published 21 days ago • 15

upvoted a paper 21 days ago

LIMMT: Less is More for Motion Tracking

Paper • 2606.06953 • Published 24 days ago • 16

upvoted a collection 22 days ago

VLA-JEPA

VLA-JEPA model checkpoints (LIBERO, Pretrain, SimplerEnv) • 3 items • Updated May 28 • 14

upvoted a paper about 1 month ago

FrameSkip: Learning from Fewer but More Informative Frames in VLA Training

Paper • 2605.13757 • Published May 13 • 21

upvoted a paper 2 months ago

Disentangled Robot Learning via Separate Forward and Inverse Dynamics Pretraining

Paper • 2604.16391 • Published Mar 27 • 4

submitted a paper to Daily Papers 2 months ago

Disentangled Robot Learning via Separate Forward and Inverse Dynamics Pretraining

Paper • 2604.16391 • Published Mar 27 • 4

liked a model 2 months ago

zbzzbz/DeFI

Updated Mar 6 • 38 • 1