array
/

Qwen2.5-VL-MullGRPO

text-generation-inference

Model card Files Files and versions

Qwen2.5-VL-MullGRPO / README.md

array's picture

Update README.md

8a6f122 verified 29 days ago

|

history blame contribute delete

1.71 kB

	---
	library_name: transformers
	license: cc
	datasets:
	- array/SAT
	- multimodal-reasoning-lab/Zebra-CoT
	- Video-R1/Video-R1-data
	base_model:
	- Qwen/Qwen2.5-VL-7B-Instruct
	---

	- Repository: [https://github.com/arijitray1993/mull-tokens]
	- Paper: [https://arxiv.org/abs/2512.10941]


	## How to Get Started with the Model

	WORK IN PROGRESS: more details to be added soon!

	It is highly recommended to install this version of transformers: https://github.com/arijitray1993/Mirage

	```
	git clone https://github.com/arijitray1993/Mirage
	pip install -e ./transformers/.
	```

	Next, clone this repo: https://github.com/arijitray1993/mull-tokens.

	We use a custom Qwen2.5 VL model. There is no change to the architecture, just some new tokens added.

	```
	% pip install qwen-vl-utils[decord]==0.0.8

	import importlib
	from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor

	Qwen2_5_VLForConditionalGeneration = importlib.import_module(
	'models.mmlatentdiscrete_qwen_vl'
	).Qwen2_5_VLForConditionalGeneration

	model = Qwen2_5_VLForConditionalGeneration.from_pretrained("array/Qwen2.5-VL-MullGRPO")
	processor = AutoProcessor.from_pretrained(
	"array/Qwen2.5-VL-MullGRPO",
	trust_remote_code=True
	)
	```


	## Citation [optional]

	```
	@misc{ray2025mulltokensmodalityagnosticlatentthinking,
	title={Mull-Tokens: Modality-Agnostic Latent Thinking},
	author={Arijit Ray and Ahmed Abdelkader and Chengzhi Mao and Bryan A. Plummer and Kate Saenko and Ranjay Krishna and Leonidas Guibas and Wen-Sheng Chu},
	year={2025},
	eprint={2512.10941},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2512.10941},
	}
	```