File size: 1,706 Bytes
d6d9199 9355d3f d6d9199 9355d3f 8a6f122 d6d9199 9355d3f d6d9199 9355d3f d6d9199 9355d3f d6d9199 9355d3f d6d9199 9355d3f d6d9199 9355d3f d6d9199 8a6f122 9355d3f d6d9199 8a6f122 9355d3f d6d9199 9355d3f d6d9199 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
---
library_name: transformers
license: cc
datasets:
- array/SAT
- multimodal-reasoning-lab/Zebra-CoT
- Video-R1/Video-R1-data
base_model:
- Qwen/Qwen2.5-VL-7B-Instruct
---
- **Repository:** [https://github.com/arijitray1993/mull-tokens]
- **Paper:** [https://arxiv.org/abs/2512.10941]
## How to Get Started with the Model
WORK IN PROGRESS: more details to be added soon!
It is highly recommended to install this version of transformers: https://github.com/arijitray1993/Mirage
```
git clone https://github.com/arijitray1993/Mirage
pip install -e ./transformers/.
```
Next, clone this repo: https://github.com/arijitray1993/mull-tokens.
We use a custom Qwen2.5 VL model. There is no change to the architecture, just some new tokens added.
```
% pip install qwen-vl-utils[decord]==0.0.8
import importlib
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
Qwen2_5_VLForConditionalGeneration = importlib.import_module(
'models.mmlatentdiscrete_qwen_vl'
).Qwen2_5_VLForConditionalGeneration
model = Qwen2_5_VLForConditionalGeneration.from_pretrained("array/Qwen2.5-VL-MullGRPO")
processor = AutoProcessor.from_pretrained(
"array/Qwen2.5-VL-MullGRPO",
trust_remote_code=True
)
```
## Citation [optional]
```
@misc{ray2025mulltokensmodalityagnosticlatentthinking,
title={Mull-Tokens: Modality-Agnostic Latent Thinking},
author={Arijit Ray and Ahmed Abdelkader and Chengzhi Mao and Bryan A. Plummer and Kate Saenko and Ranjay Krishna and Leonidas Guibas and Wen-Sheng Chu},
year={2025},
eprint={2512.10941},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.10941},
}
```
|