luoruipu1
/

Volcano-7b

Text Generation

Model card Files Files and versions

Volcano-7b / README.md

nielsr's picture

nielsr HF Staff

Add model card and metadata

0083a9d verified 12 months ago

|

1.07 kB

	---
	pipeline_tag: image-text-to-text
	library_name: transformers
	license: mit # Please verify license in the repository
	---

	# VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models

	This model, VolCano, is presented in the paper [VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models](https://arxiv.org/abs/2405.16919) and is designed for multi-step visually grounded reasoning.

	Code and further details are available at: https://github.com/RupertLuo/VoCoT

	## Quick Start

	This example demonstrates basic usage. For more details, please refer to the project's GitHub repository.

	```python
	from model.load_model import load_model, infer
	from PIL import Image

	# loading the model
	model_path = 'luoruipu1/Volcano-7b'
	model, preprocessor = load_model(model_path, precision='fp16')

	# perform reasoning, activate VoCoT by passing cot=True
	input_image = Image.open('figs/sample_input.jpg')
	response = infer(model, preprocessor, input_image, 'Describe the image.', cot=True)
	print('response: ', response[0])
	```