How to process multi-images?
#8
by
Yiyiyi - opened
Would you mind share how to process multi-images?
how to change this ' img_input=img_path,'??? it seems it only accept single-image?
Thank U!
generated_ids = model.generate(
**inputs,
temperature=0.1,
top_p=0.001,
repetition_penalty=1.05,
do_sample=True,
max_new_tokens=32768,
img_input=img_path,
)
Thank you for your interest to Youtu-VL!
img_inputis designed for CV tasks (segmentation, detection, depth, etc.) and currently supports single-image input only.For VL tasks (VQA, multimodal reasoning, etc.), multi-image input is supported by adding multiple images inside
messages. You can omitimg_inputingenerate().
Example:
messages = [{
"role": "user",
"content": [
{"type": "image", "image": "/path/to/image-A"},
{"type": "image", "image": "/path/to/image-B"},
{"type": "text", "text": "Compare these two images."}
]
}]
If you have any further questions, please feel free to let me know.