Add model card and metadata

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +29 -0
README.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: image-text-to-text
3
+ library_name: transformers
4
+ license: mit # Please verify license in the repository
5
+ ---
6
+
7
+ # VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
8
+
9
+ This model, VolCano, is presented in the paper [VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models](https://arxiv.org/abs/2405.16919) and is designed for multi-step visually grounded reasoning.
10
+
11
+ Code and further details are available at: https://github.com/RupertLuo/VoCoT
12
+
13
+ ## Quick Start
14
+
15
+ This example demonstrates basic usage. For more details, please refer to the project's GitHub repository.
16
+
17
+ ```python
18
+ from model.load_model import load_model, infer
19
+ from PIL import Image
20
+
21
+ # loading the model
22
+ model_path = 'luoruipu1/Volcano-7b'
23
+ model, preprocessor = load_model(model_path, precision='fp16')
24
+
25
+ # perform reasoning, activate VoCoT by passing cot=True
26
+ input_image = Image.open('figs/sample_input.jpg')
27
+ response = infer(model, preprocessor, input_image, 'Describe the image.', cot=True)
28
+ print('response: ', response[0])
29
+ ```