Add link to paper

#2
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -7,6 +7,8 @@ pipeline_tag: image-text-to-text
7
 
8
  ## Overview
9
 
 
 
10
  Interleaved Vision-Text Latent Reasoning (IVT-LR) is the first VLM framework that unifies textual and visual representations in the latent space and implements multimodal latent reasoning. Specifically, IVT-LR represents each reasoning step by combining two implicit parts: **latent text** and **latent vision**. We further introduce a progressive multi-stage training strategy to enable MLLMs to perform the above multimodal latent reasoning steps.
11
 
12
  ---
 
7
 
8
  ## Overview
9
 
10
+ This model was presented in the paper [Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space](https://huggingface.co/papers/2510.12603).
11
+
12
  Interleaved Vision-Text Latent Reasoning (IVT-LR) is the first VLM framework that unifies textual and visual representations in the latent space and implements multimodal latent reasoning. Specifically, IVT-LR represents each reasoning step by combining two implicit parts: **latent text** and **latent vision**. We further introduce a progressive multi-stage training strategy to enable MLLMs to perform the above multimodal latent reasoning steps.
13
 
14
  ---