FYYDCC commited on
Commit
a853c3b
·
verified ·
1 Parent(s): 720baea

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -0
README.md ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ # IVT-LR (Chameleon)
6
+
7
+ ## Overview
8
+
9
+ This model was presented in the paper [Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space](https://huggingface.co/papers/2510.12603).
10
+
11
+ Interleaved Vision-Text Latent Reasoning (IVT-LR) is the first VLM framework that unifies textual and visual representations in the latent space and implements multimodal latent reasoning. Specifically, IVT-LR represents each reasoning step by combining two implicit parts: **latent text** and **latent vision**. We further introduce a progressive multi-stage training strategy to enable MLLMs to perform the above multimodal latent reasoning steps.
12
+
13
+ ---
14
+
15
+ ## Usage
16
+
17
+ This repository provides pretrained Chameleon models for IVT-LR on **M3CoT** and **ScienceQA** datasets.
18
+
19
+ To see detailed usage, including inference code and scripts for training, please refer to the [GitHub repository](https://github.com/ModalityDance/IVT-LR).
20
+
21
+ ---
22
+
23
+ ### Download Models
24
+
25
+ You can download the models directly from Hugging Face using `huggingface_hub`:
26
+
27
+ ```python
28
+ from huggingface_hub import hf_hub_download
29
+
30
+ # Download Chameleon model trained on M3CoT
31
+ chameleon_m3cot_path = hf_hub_download("ModalityDance/IVTLR_CHAMELEON_M3COT", "model.pth")
32
+
33
+ # Download Chameleon model trained on ScienceQA
34
+ chameleon_sqa_path = hf_hub_download("ModalityDance/IVTLR_CHAMELEON_SQA", "model.pth")
35
+ ```
36
+