8 12 2

GongDengxian

godx7

AI & ML interests

None yet

Recent Activity

upvoted a paper 4 days ago

PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models

upvoted a paper 16 days ago

Skill-3D: Evolving Scene-Aware Skills for Agentic 3D Spatial Reasoning

upvoted a paper 21 days ago

Towards One-to-Many Temporal Grounding

View all activity

Organizations

upvoted a paper 4 days ago

PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models

Paper • 2606.19534 • Published 9 days ago • 61

upvoted a paper 16 days ago

Skill-3D: Evolving Scene-Aware Skills for Agentic 3D Spatial Reasoning

Paper • 2606.07436 • Published 21 days ago • 24

upvoted a paper 21 days ago

Towards One-to-Many Temporal Grounding

Paper • 2606.06294 • Published 22 days ago • 7

upvoted a paper 5 months ago

SAMTok: Representing Any Mask with Two Words

Paper • 2601.16093 • Published Jan 22 • 44

upvoted a collection 8 months ago

Describe Anything

Collection

Multimodal Large Language Models for Detailed Localized Image and Video Captioning • 7 items • Updated 14 days ago • 63

upvoted 2 papers 9 months ago

DeH4R: A Decoupled and Hybrid Method for Road Network Graph Extraction

Paper • 2508.13669 • Published Aug 19, 2025 • 1

The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA

Paper • 2509.16972 • Published Sep 21, 2025 • 2

upvoted a paper 10 months ago

SIU3R: Simultaneous Scene Understanding and 3D Reconstruction Beyond Feature Alignment

Paper • 2507.02705 • Published Jul 3, 2025 • 2

upvoted a paper 12 months ago

ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing

Paper • 2506.19848 • Published Jun 24, 2025 • 27

upvoted a collection 12 months ago

Qwen2.5-VL

Collection

Vision-language model series based on Qwen2.5 • 10 items • Updated Mar 2 • 566

upvoted a paper about 1 year ago

Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding

Paper • 2504.10465 • Published Apr 14, 2025 • 27

upvoted a paper almost 2 years ago

OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27, 2024 • 54

GongDengxian

AI & ML interests

Recent Activity

Organizations

godx7's activity