JPShi's picture

JPShi

JPShi

·

SJP2022

AI & ML interests

None yet

Recent Activity

upvoted a paper 8 days ago

Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification

liked a dataset 16 days ago

mjuicem/StreamingBench

upvoted a paper 2 months ago

DiningBench: A Hierarchical Multi-view Benchmark for Perception and Reasoning in the Dietary Domain

View all activity

Organizations

None yet

upvoted a paper 8 days ago

Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification

Paper • 2606.18249 • Published 10 days ago • 14

upvoted a paper 2 months ago

DiningBench: A Hierarchical Multi-view Benchmark for Perception and Reasoning in the Dietary Domain

Paper • 2604.10425 • Published Apr 12 • 3

upvoted 3 papers 3 months ago

FlashMotion: Few-Step Controllable Video Generation with Trajectory Guidance

Paper • 2603.12146 • Published Mar 12 • 5

Can Vision-Language Models Solve the Shell Game?

Paper • 2603.08436 • Published Mar 9 • 39

WeEdit: A Dataset, Benchmark and Glyph-Guided Framework for Text-centric Image Editing

Paper • 2603.11593 • Published Mar 12 • 25

upvoted a paper 4 months ago

CaTok: Taming Mean Flows for One-Dimensional Causal Image Tokenization

Paper • 2603.06449 • Published Mar 6 • 6

upvoted a paper 5 months ago

VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding

Paper • 2601.07290 • Published Jan 12 • 7

upvoted a collection 5 months ago

VideoLoom

Model Zoo for VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding • 3 items • Updated Jan 13 • 1

upvoted a collection almost 2 years ago

LLaVA-Video

Models focus on video understanding (previously known as LLaVA-NeXT-Video). • 8 items • Updated Feb 21, 2025 • 63