Xudong Wang's picture

9 8

Xudong Wang

xudongw

·

https://people.eecs.berkeley.edu/~xdwang/index.html

frank-xwang

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 days ago

VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents

upvoted a collection about 2 months ago

CoVT: Chain-of-Visual-Thought

upvoted a paper 2 months ago

Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens

View all activity

Organizations

upvoted a paper 2 days ago

VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents

Paper • 2601.16973 • Published 5 days ago • 30

upvoted a collection about 2 months ago

CoVT: Chain-of-Visual-Thought

Enrich VLMs’ vision-centric reasoning capabilities via Chain-of-Visual-Thought! • 7 items • Updated Nov 25, 2025 • 6

upvoted 2 papers 2 months ago

Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens

Paper • 2511.19418 • Published Nov 24, 2025 • 29

UnSAMv2: Self-Supervised Learning Enables Segment Anything at Any Granularity

Paper • 2511.13714 • Published Nov 17, 2025 • 12

upvoted a paper 3 months ago

Constantly Improving Image Models Need Constantly Improving Benchmarks

Paper • 2510.15021 • Published Oct 16, 2025 • 7

upvoted a paper 5 months ago

Reconstruction Alignment Improves Unified Multimodal Models

Paper • 2509.07295 • Published Sep 8, 2025 • 40

upvoted a paper 9 months ago

Describe Anything: Detailed Localized Image and Video Captioning

Paper • 2504.16072 • Published Apr 22, 2025 • 63

upvoted a paper 10 months ago

TULIP: Towards Unified Language-Image Pretraining

Paper • 2503.15485 • Published Mar 19, 2025 • 49

upvoted a paper almost 2 years ago

InstanceDiffusion: Instance-level Control for Image Generation

Paper • 2402.03290 • Published Feb 5, 2024 • 1