Add model card and metadata
#1
by
nielsr
HF Staff
- opened
README.md
ADDED
|
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
pipeline_tag: image-text-to-text
|
| 3 |
+
library_name: transformers
|
| 4 |
+
license: mit # Please verify license in the repository
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
# VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
|
| 8 |
+
|
| 9 |
+
This model, VolCano, is presented in the paper [VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models](https://arxiv.org/abs/2405.16919) and is designed for multi-step visually grounded reasoning.
|
| 10 |
+
|
| 11 |
+
Code and further details are available at: https://github.com/RupertLuo/VoCoT
|
| 12 |
+
|
| 13 |
+
## Quick Start
|
| 14 |
+
|
| 15 |
+
This example demonstrates basic usage. For more details, please refer to the project's GitHub repository.
|
| 16 |
+
|
| 17 |
+
```python
|
| 18 |
+
from model.load_model import load_model, infer
|
| 19 |
+
from PIL import Image
|
| 20 |
+
|
| 21 |
+
# loading the model
|
| 22 |
+
model_path = 'luoruipu1/Volcano-7b'
|
| 23 |
+
model, preprocessor = load_model(model_path, precision='fp16')
|
| 24 |
+
|
| 25 |
+
# perform reasoning, activate VoCoT by passing cot=True
|
| 26 |
+
input_image = Image.open('figs/sample_input.jpg')
|
| 27 |
+
response = infer(model, preprocessor, input_image, 'Describe the image.', cot=True)
|
| 28 |
+
print('response: ', response[0])
|
| 29 |
+
```
|