|
|
--- |
|
|
library_name: transformers |
|
|
license: cc |
|
|
datasets: |
|
|
- array/SAT |
|
|
- multimodal-reasoning-lab/Zebra-CoT |
|
|
- Video-R1/Video-R1-data |
|
|
base_model: |
|
|
- Qwen/Qwen2.5-VL-7B-Instruct |
|
|
--- |
|
|
|
|
|
- **Repository:** [https://github.com/arijitray1993/mull-tokens] |
|
|
- **Paper:** [https://arxiv.org/abs/2512.10941] |
|
|
|
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
WORK IN PROGRESS: more details to be added soon! |
|
|
|
|
|
It is highly recommended to install this version of transformers: https://github.com/arijitray1993/Mirage |
|
|
|
|
|
``` |
|
|
git clone https://github.com/arijitray1993/Mirage |
|
|
pip install -e ./transformers/. |
|
|
``` |
|
|
|
|
|
Next, clone this repo: https://github.com/arijitray1993/mull-tokens. |
|
|
|
|
|
We use a custom Qwen2.5 VL model. There is no change to the architecture, just some new tokens added. |
|
|
|
|
|
``` |
|
|
% pip install qwen-vl-utils[decord]==0.0.8 |
|
|
|
|
|
import importlib |
|
|
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor |
|
|
|
|
|
Qwen2_5_VLForConditionalGeneration = importlib.import_module( |
|
|
'models.mmlatentdiscrete_qwen_vl' |
|
|
).Qwen2_5_VLForConditionalGeneration |
|
|
|
|
|
model = Qwen2_5_VLForConditionalGeneration.from_pretrained("array/Qwen2.5-VL-MullGRPO") |
|
|
processor = AutoProcessor.from_pretrained( |
|
|
"array/Qwen2.5-VL-MullGRPO", |
|
|
trust_remote_code=True |
|
|
) |
|
|
``` |
|
|
|
|
|
|
|
|
## Citation [optional] |
|
|
|
|
|
``` |
|
|
@misc{ray2025mulltokensmodalityagnosticlatentthinking, |
|
|
title={Mull-Tokens: Modality-Agnostic Latent Thinking}, |
|
|
author={Arijit Ray and Ahmed Abdelkader and Chengzhi Mao and Bryan A. Plummer and Kate Saenko and Ranjay Krishna and Leonidas Guibas and Wen-Sheng Chu}, |
|
|
year={2025}, |
|
|
eprint={2512.10941}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CV}, |
|
|
url={https://arxiv.org/abs/2512.10941}, |
|
|
} |
|
|
``` |
|
|
|
|
|
|