intfloat
/

mmE5-mllama-11b-instruct

Zero-Shot Image Classification

sentence-transformers

image-text-to-text

text-generation-inference

Model card Files Files and versions

Samoed commited on Feb 23, 2025

Commit

cbde83e

·

verified ·

1 Parent(s): 31932a6

use only `<|image|>`

Files changed (1) hide show

custom_st.py +1 -1

custom_st.py CHANGED Viewed

@@ -67,7 +67,7 @@ class MultiModalTransformer(BaseTransformer):
                 if sub_item["type"] == "text":
                     text += sub_item["content"]
                 elif sub_item["type"] in ["image_bytes", "image_path"]:
-                    text += "<|image|><|begin_of_text|>"
                     if sub_item["type"] == "image_bytes":
                         img = Image.open(BytesIO(sub_item["content"])).convert("RGB")
                     else:

                 if sub_item["type"] == "text":
                     text += sub_item["content"]
                 elif sub_item["type"] in ["image_bytes", "image_path"]:
+                    text += "<|image|>"
                     if sub_item["type"] == "image_bytes":
                         img = Image.open(BytesIO(sub_item["content"])).convert("RGB")
                     else: