8 28 30

Tony Zhao PRO

tianchez

https://www.tianchez.com

AI & ML interests

Multimodal & Generative AI

Recent Activity

upvoted an article about 18 hours ago

VLX-Go: Vision-Language Short-Horizon Waypoint Prediction for Embodied Navigation

published an article about 18 hours ago

VLX-Go: Vision-Language Short-Horizon Waypoint Prediction for Embodied Navigation

upvoted an article 1 day ago

VLX-Seek: Improving VLM Fine-Grained Perception via Region Reference Instead of Coordinate Generation

View all activity

Organizations

upvoted an article about 18 hours ago

Article

VLX-Go: Vision-Language Short-Horizon Waypoint Prediction for Embodied Navigation

omlab

•

about 18 hours ago

• 10

upvoted an article 1 day ago

Article

VLX-Seek: Improving VLM Fine-Grained Perception via Region Reference Instead of Coordinate Generation

omlab

•

1 day ago

• 12

upvoted an article 2 days ago

Article

VLX-Flow: Continuous Video Understanding for Real-Time Multimodal Interaction

omlab

•

2 days ago

• 10

upvoted a paper 27 days ago

Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models

Paper • 2605.28132 • Published May 27 • 25

upvoted a collection 2 months ago

Qwen3.6

Collection

4 items • Updated Apr 22 • 420

upvoted a paper 9 months ago

VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs

Paper • 2509.25916 • Published Sep 30, 2025 • 6

upvoted a paper about 1 year ago

VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

Paper • 2504.07615 • Published Apr 10, 2025 • 36

upvoted 2 articles over 1 year ago

Article

Improving Object Detection through Reinforcement Learning with VLM-R1

omlab

•

Mar 25, 2025

• 3

Article

Trials, Errors, and Breakthroughs: Our Rocky Road to OVD SOTA with Reinforcement Learning

omlab

•

Mar 25, 2025

• 3

upvoted a collection over 1 year ago

VLM-R1-models

Collection

A collection of VLM-R1 Models • 7 items • Updated Jul 11, 2025 • 9

upvoted an article over 1 year ago

Article

Replicating DeepSeek R1 for Information Extraction

Ihor

•

Jan 31, 2025

• 44

upvoted 9 papers over 1 year ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 454

OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer

Paper • 2406.16620 • Published Jun 24, 2024 • 3

RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing

Paper • 2306.11300 • Published Jun 20, 2023 • 2

Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

Paper • 2403.06892 • Published Mar 11, 2024 • 2

GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection

Paper • 2312.15043 • Published Dec 22, 2023 • 2

VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations

Paper • 2207.00221 • Published Jul 1, 2022 • 2

OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network

Paper • 2209.05946 • Published Sep 10, 2022 • 2

OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding

Paper • 2407.04923 • Published Jul 6, 2024 • 2

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 162

Tony Zhao PRO

AI & ML interests

Recent Activity

Organizations

tianchez's activity

VLX-Go: Vision-Language Short-Horizon Waypoint Prediction for Embodied Navigation

VLX-Seek: Improving VLM Fine-Grained Perception via Region Reference Instead of Coordinate Generation

VLX-Flow: Continuous Video Understanding for Real-Time Multimodal Interaction

Improving Object Detection through Reinforcement Learning with VLM-R1

Trials, Errors, and Breakthroughs: Our Rocky Road to OVD SOTA with Reinforcement Learning

Replicating DeepSeek R1 for Information Extraction