Update README.md
Browse files
README.md
CHANGED
|
@@ -27,7 +27,7 @@ language:
|
|
| 27 |
|
| 28 |
## Summary
|
| 29 |
|
| 30 |
-
- Mantis is
|
| 31 |
- Mantis is trained to have multi-image skills including co-reference, reasoning, comparing, temporal understanding.
|
| 32 |
- Mantis reaches the state-of-the-art performance on five multi-image benchmarks (NLVR2, Q-Bench, BLINK, MVBench, Mantis-Eval), and also maintain a strong single-image performance on par with CogVLM and Emu2.
|
| 33 |
|
|
@@ -58,10 +58,11 @@ image2 = "image2.jpg"
|
|
| 58 |
images = [Image.open(image1), Image.open(image2)]
|
| 59 |
|
| 60 |
# load processor and model
|
| 61 |
-
from mantis.models.mllava import MLlavaProcessor, LlavaForConditionalGeneration
|
| 62 |
-
|
|
|
|
| 63 |
attn_implementation = None # or "flash_attention_2"
|
| 64 |
-
model =
|
| 65 |
|
| 66 |
generation_kwargs = {
|
| 67 |
"max_new_tokens": 1024,
|
|
|
|
| 27 |
|
| 28 |
## Summary
|
| 29 |
|
| 30 |
+
- Mantis-Fuyu is a Fuyu based LMM with **interleaved text and image as inputs**, train on Mantis-Instruct under academic-level resources (i.e. 36 hours on 16xA100-40G).
|
| 31 |
- Mantis is trained to have multi-image skills including co-reference, reasoning, comparing, temporal understanding.
|
| 32 |
- Mantis reaches the state-of-the-art performance on five multi-image benchmarks (NLVR2, Q-Bench, BLINK, MVBench, Mantis-Eval), and also maintain a strong single-image performance on par with CogVLM and Emu2.
|
| 33 |
|
|
|
|
| 58 |
images = [Image.open(image1), Image.open(image2)]
|
| 59 |
|
| 60 |
# load processor and model
|
| 61 |
+
# from mantis.models.mllava import MLlavaProcessor, LlavaForConditionalGeneration
|
| 62 |
+
from mantis.models.mfuyu import MFuyuForCausalLM, MFuyuProcessor
|
| 63 |
+
processor = MFuyuProcessor.from_pretrained("TIGER-Lab/Mantis-8B-Fuyu")
|
| 64 |
attn_implementation = None # or "flash_attention_2"
|
| 65 |
+
model = MFuyuForCausalLM.from_pretrained("TIGER-Lab/Mantis-8B-Fuyu", device_map="cuda", torch_dtype=torch.bfloat16, attn_implementation=attn_implementation)
|
| 66 |
|
| 67 |
generation_kwargs = {
|
| 68 |
"max_new_tokens": 1024,
|