metadata
library_name: transformers
license: cc
datasets:
- array/SAT
- multimodal-reasoning-lab/Zebra-CoT
- Video-R1/Video-R1-data
base_model:
- Qwen/Qwen2.5-VL-7B-Instruct
- Repository: [https://github.com/arijitray1993/mull-tokens]
- Paper: [https://arxiv.org/abs/2512.10941]
How to Get Started with the Model
WORK IN PROGRESS: more details to be added soon!
It is highly recommended to install this version of transformers: https://github.com/arijitray1993/Mirage
git clone https://github.com/arijitray1993/Mirage
pip install -e ./transformers/.
Next, clone this repo: https://github.com/arijitray1993/mull-tokens.
We use a custom Qwen2.5 VL model. There is no change to the architecture, just some new tokens added.
% pip install qwen-vl-utils[decord]==0.0.8
import importlib
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
Qwen2_5_VLForConditionalGeneration = importlib.import_module(
'models.mmlatentdiscrete_qwen_vl'
).Qwen2_5_VLForConditionalGeneration
model = Qwen2_5_VLForConditionalGeneration.from_pretrained("array/Qwen2.5-VL-MullGRPO")
processor = AutoProcessor.from_pretrained(
"array/Qwen2.5-VL-MullGRPO",
trust_remote_code=True
)
Citation [optional]
@misc{ray2025mulltokensmodalityagnosticlatentthinking,
title={Mull-Tokens: Modality-Agnostic Latent Thinking},
author={Arijit Ray and Ahmed Abdelkader and Chengzhi Mao and Bryan A. Plummer and Kate Saenko and Ranjay Krishna and Leonidas Guibas and Wen-Sheng Chu},
year={2025},
eprint={2512.10941},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.10941},
}