BytedanceDouyinContent
/

VGR

Model card Files Files and versions

ZijianKang commited on Jun 24, 2025

Commit

a245995

·

verified ·

1 Parent(s): 38fe853

Update README.md

Files changed (1) hide show

README.md +1 -2

README.md CHANGED Viewed

@@ -4,8 +4,7 @@ license: apache-2.0
 # VGR: Visual Grounded Reasoning
 ## Overview
-VGR (Visual Grounded Reasoning) is a novel multimodal large language model (MLLM) designed to enhance fine-grained visual perception and reasoning capabilities. Unlike traditional MLLMs, VGR enables selective attention to visual regions during inference, improving accuracy in complex visual reasoning tasks. It introduces a self-driven selective visual replay mechanism and is trained on a large-scale dataset (VGR-SFT) that integrates visual grounding and language deduction.
 - [Arxiv Paper Link](https://arxiv.org/pdf/2506.11991)
 - [Data Repository](https://huggingface.co/datasets/BytedanceDouyinContent/VGR)

 # VGR: Visual Grounded Reasoning
 ## Overview
+This is the home page for VGR (Visual Grounded Reasoning): a novel multimodal large language model (MLLM) designed to enhance fine-grained visual perception and reasoning capabilities. Unlike traditional MLLMs, VGR enables selective attention to visual regions during inference, improving accuracy in complex visual reasoning tasks. It introduces a self-driven selective visual replay mechanism and is trained on a large-scale dataset (VGR-SFT) that integrates visual grounding and language deduction.
 - [Arxiv Paper Link](https://arxiv.org/pdf/2506.11991)
 - [Data Repository](https://huggingface.co/datasets/BytedanceDouyinContent/VGR)