NOVAglow646
/

Monet-7B

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions

Monet-7B / README.md

NOVAglow646's picture

Update README.md

39c1e32 verified 3 months ago

|

history blame contribute delete

1.08 kB

	---
	license: mit
	language:
	- en
	metrics:
	- accuracy
	base_model:
	- Qwen/Qwen2.5-VL-7B-Instruct
	pipeline_tag: image-text-to-text
	library_name: transformers
	---

	# Introduction
	This is the pretrained model for paper "Monet: Reasoning in Latent Visual Space Beyond Images and Language"

	Paper: http://arxiv.org/abs/2511.21395

	Code: https://github.com/NOVAglow646/Monet

	How to use this model: we provide an [inference example](https://github.com/NOVAglow646/Monet/blob/main/inference/vllm_inference_example.py) in our GitHub repo.

	# Citation
	If you find this work useful, please use the following BibTeX. Thank you for your support!

	```bibtex
	@misc{wang2025monetreasoninglatentvisual,
	title={Monet: Reasoning in Latent Visual Space Beyond Images and Language},
	author={Qixun Wang and Yang Shi and Yifei Wang and Yuanxing Zhang and Pengfei Wan and Kun Gai and Xianghua Ying and Yisen Wang},
	year={2025},
	eprint={2511.21395},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2511.21395},
	}
	```