How to process multi-images?

#8
by Yiyiyi - opened

Would you mind share how to process multi-images?

how to change this ' img_input=img_path,'??? it seems it only accept single-image?

Thank U!

generated_ids = model.generate(
**inputs,
temperature=0.1,
top_p=0.001,
repetition_penalty=1.05,
do_sample=True,
max_new_tokens=32768,
img_input=img_path,
)

Thank you for your interest to Youtu-VL!

  • img_input is designed for CV tasks (segmentation, detection, depth, etc.) and currently supports single-image input only.

  • For VL tasks (VQA, multimodal reasoning, etc.), multi-image input is supported by adding multiple images inside messages. You can omit img_input in generate().

Example:

messages = [{
  "role": "user",
  "content": [
    {"type": "image", "image": "/path/to/image-A"},
    {"type": "image", "image": "/path/to/image-B"},
    {"type": "text",  "text": "Compare these two images."}
  ]
}]

If you have any further questions, please feel free to let me know.

Sign up or log in to comment