FYYDCC nielsr HF Staff commited on
Commit
8b0fd47
·
verified ·
1 Parent(s): 69dd2b3

Add link to paper (#2)

Browse files

- Add link to paper (97cdc741411051f4b2ad919d7a178b63f7e1bb3e)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -7,6 +7,8 @@ pipeline_tag: image-text-to-text
7
 
8
  ## Overview
9
 
 
 
10
  Interleaved Vision-Text Latent Reasoning (IVT-LR) is the first VLM framework that unifies textual and visual representations in the latent space and implements multimodal latent reasoning. Specifically, IVT-LR represents each reasoning step by combining two implicit parts: **latent text** and **latent vision**. We further introduce a progressive multi-stage training strategy to enable MLLMs to perform the above multimodal latent reasoning steps.
11
 
12
  ---
 
7
 
8
  ## Overview
9
 
10
+ This model was presented in the paper [Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space](https://huggingface.co/papers/2510.12603).
11
+
12
  Interleaved Vision-Text Latent Reasoning (IVT-LR) is the first VLM framework that unifies textual and visual representations in the latent space and implements multimodal latent reasoning. Specifically, IVT-LR represents each reasoning step by combining two implicit parts: **latent text** and **latent vision**. We further introduce a progressive multi-stage training strategy to enable MLLMs to perform the above multimodal latent reasoning steps.
13
 
14
  ---