FYYDCC commited on
Commit
765f2c5
·
verified ·
1 Parent(s): 0278297

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -0
README.md ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+ # IVT-LR (Chameleon)
5
+
6
+ ## Overview
7
+
8
+ This model was presented in the paper [Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space](https://huggingface.co/papers/2510.12603).
9
+
10
+ Interleaved Vision-Text Latent Reasoning (IVT-LR) is the first VLM framework that unifies textual and visual representations in the latent space and implements multimodal latent reasoning. Specifically, IVT-LR represents each reasoning step by combining two implicit parts: **latent text** and **latent vision**. We further introduce a progressive multi-stage training strategy to enable MLLMs to perform the above multimodal latent reasoning steps.
11
+
12
+ ---
13
+
14
+ ## Usage
15
+
16
+ This repository provides pretrained Chameleon models for IVT-LR on **M3CoT** and **ScienceQA** datasets.
17
+
18
+ To see detailed usage, including inference code and scripts for training, please refer to the [GitHub repository](https://github.com/ModalityDance/IVT-LR).
19
+
20
+ ---
21
+
22
+ ### Download Models
23
+
24
+ You can download the models directly from Hugging Face using `huggingface_hub`:
25
+
26
+ ```python
27
+ from huggingface_hub import hf_hub_download
28
+
29
+ # Download Chameleon model trained on M3CoT
30
+ chameleon_m3cot_path = hf_hub_download("ModalityDance/IVTLR_CHAMELEON_M3COT", "model.pth")
31
+
32
+ # Download Chameleon model trained on ScienceQA
33
+ chameleon_sqa_path = hf_hub_download("ModalityDance/IVTLR_CHAMELEON_SQA", "model.pth")
34
+ ```