Improve model card: add metadata and links

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +16 -3
README.md CHANGED
@@ -1,8 +1,21 @@
1
  ---
2
  license: apache-2.0
 
 
3
  ---
4
- # CoVT Checkpoint (Depth Aligned)
5
 
6
- ## Model Description
7
- This CoVT checkpoint is aligned with **4 Depth tokens**.
 
 
 
 
 
 
 
 
 
 
 
 
8
  These task-specific tokens are integrated into the model’s embedding space to enhance depth-awareness.
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: image-text-to-text
4
+ library_name: transformers
5
  ---
 
6
 
7
+ # Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
8
+
9
+ This repository hosts a CoVT checkpoint, as presented in the paper [Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens](https://huggingface.co/papers/2511.19418).
10
+
11
+ **Project Page**: https://wakalsprojectpage.github.io/comt-website
12
+ **Code**: https://github.com/Wakals/CoMT
13
+
14
+ ## Overview of CoVT
15
+
16
+ Rather than restricting VLM reasoning to a discrete language space with limited representational capacity, **CoVT** forms a visual thought chain that enables VLMs to reason in continuous visual space. By introducing *continuous visual tokens* that encode perceptual cues (e.g., segmentation, depth, instance, and edge structure), CoVT composes *chains of textual and visual thoughts* that link semantic reasoning with perceptual grounding. These visual “thought chains” bridge language and vision, enabling fine-grained understanding, spatial precision, and geometric awareness beyond the reach of text-based reasoning.
17
+
18
+ ## CoVT Checkpoint (Depth Aligned)
19
+
20
+ This CoVT checkpoint is aligned with **4 Depth tokens**.
21
  These task-specific tokens are integrated into the model’s embedding space to enhance depth-awareness.