Volcano-7b / README.md
nielsr's picture
nielsr HF Staff
Add model card and metadata
0083a9d verified
|
raw
history blame
1.07 kB
metadata
pipeline_tag: image-text-to-text
library_name: transformers
license: mit

VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models

This model, VolCano, is presented in the paper VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models and is designed for multi-step visually grounded reasoning.

Code and further details are available at: https://github.com/RupertLuo/VoCoT

Quick Start

This example demonstrates basic usage. For more details, please refer to the project's GitHub repository.

from model.load_model import load_model, infer
from PIL import Image

# loading the model
model_path = 'luoruipu1/Volcano-7b'
model, preprocessor = load_model(model_path, precision='fp16')

# perform reasoning, activate VoCoT by passing cot=True
input_image = Image.open('figs/sample_input.jpg')
response = infer(model, preprocessor, input_image, 'Describe the image.', cot=True)
print('response: ', response[0])