mtensor
commited on
Commit
·
860995a
1
Parent(s):
03212a3
add examples to readme
Browse files
README.md
CHANGED
|
@@ -38,6 +38,64 @@ Though not the focus of this model, we did evaluate it on standard image underst
|
|
| 38 |
| COCO Captions | 141 | 138 | n/a | n/a | 149 | 135 | 138 |
|
| 39 |
| AI2D | 64.5 | 73.7 | n/a | 62.3 | 81.2 | n/a | n/a |
|
| 40 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
## Uses
|
| 42 |
|
| 43 |
### Direct Use
|
|
|
|
| 38 |
| COCO Captions | 141 | 138 | n/a | n/a | 149 | 135 | 138 |
|
| 39 |
| AI2D | 64.5 | 73.7 | n/a | 62.3 | 81.2 | n/a | n/a |
|
| 40 |
|
| 41 |
+
## How to Use
|
| 42 |
+
|
| 43 |
+
You can load the model and perform inference as follows:
|
| 44 |
+
```python
|
| 45 |
+
from transformers import FuyuForCausalLM, AutoTokenizer, FuyuProcessor, FuyuImageProcessor
|
| 46 |
+
from PIL import Image
|
| 47 |
+
|
| 48 |
+
# load model, tokenizer, and processor
|
| 49 |
+
pretrained_path = "adept/fuyu-8b"
|
| 50 |
+
tokenizer = AutoTokenizer.from_pretrained(pretrained_path)
|
| 51 |
+
|
| 52 |
+
image_processor = FuyuImageProcessor()
|
| 53 |
+
processor = FuyuProcessor(image_processor=image_processor, tokenizer=tokenizer)
|
| 54 |
+
|
| 55 |
+
model = FuyuForCausalLM.from_pretrained(pretrained_path, device_map="cuda:0")
|
| 56 |
+
|
| 57 |
+
# test inference
|
| 58 |
+
text_prompt = "Generate a coco-style caption.\n"
|
| 59 |
+
image_path = "bus.png" # https://huggingface.co/adept-hf-collab/fuyu-8b/blob/main/bus.png
|
| 60 |
+
image_pil = Image.open(image_path)
|
| 61 |
+
|
| 62 |
+
model_inputs = processor(text=text_prompt, images=[image_pil], device="cuda:0")
|
| 63 |
+
for k, v in model_inputs.items():
|
| 64 |
+
model_inputs[k] = v.to("cuda:0")
|
| 65 |
+
|
| 66 |
+
generation_output = model.generate(**model_inputs, max_new_tokens=8)
|
| 67 |
+
generation_text = processor.batch_decode(generation_output, skip_special_tokens=True)[0][-38:]
|
| 68 |
+
assert generation_text == "A bus parked on the side of a road.<s>"
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
Fuyu can also perform some question answering on natural images:
|
| 72 |
+
```python
|
| 73 |
+
text_prompt = "What color is the bus?\n"
|
| 74 |
+
image_path = "/bus.png" # https://huggingface.co/adept-hf-collab/fuyu-8b/blob/main/bus.png
|
| 75 |
+
image_pil = Image.open(image_path)
|
| 76 |
+
|
| 77 |
+
model_inputs = processor(text=text_prompt, images=[image_pil], device="cuda:0")
|
| 78 |
+
for k, v in model_inputs.items():
|
| 79 |
+
model_inputs[k] = v.to("cuda:0")
|
| 80 |
+
|
| 81 |
+
generation_output = model.generate(**model_inputs, max_new_tokens=6)
|
| 82 |
+
generation_text = processor.batch_decode(generation_output, skip_special_tokens=True)[0][-17:]
|
| 83 |
+
assert generation_text == "The bus is blue.\n"
|
| 84 |
+
|
| 85 |
+
|
| 86 |
+
text_prompt = "What is the highest life expectancy at birth of male?\n"
|
| 87 |
+
image_path = "chart.png" # https://huggingface.co/adept-hf-collab/fuyu-8b/blob/main/chart.png
|
| 88 |
+
image_pil = Image.open(image_path)
|
| 89 |
+
|
| 90 |
+
model_inputs = processor(text=text_prompt, images=[image_pil], device="cuda:0")
|
| 91 |
+
for k, v in model_inputs.items():
|
| 92 |
+
model_inputs[k] = v.to("cuda:0")
|
| 93 |
+
|
| 94 |
+
generation_output = model.generate(**model_inputs, max_new_tokens=16)
|
| 95 |
+
generation_text = processor.batch_decode(generation_output, skip_special_tokens=True)[0][-55:]
|
| 96 |
+
assert generation_text == "The life expectancy at birth of males in 2018 is 80.7.\n"
|
| 97 |
+
```
|
| 98 |
+
|
| 99 |
## Uses
|
| 100 |
|
| 101 |
### Direct Use
|