nielsr HF Staff commited on
Commit
c071f98
·
verified ·
1 Parent(s): 864d61b

Update pipeline tag to zero-shot-image-classification

Browse files

This PR updates the `pipeline_tag` in the model card from `image-text-to-text` to `zero-shot-image-classification`.

The paper abstract states that UniME learns "discriminative representations for diverse downstream tasks," particularly for "image-text retrieval and clustering." The quick start guide also demonstrates computing a similarity score between image and text embeddings. This functionality is more aligned with discriminative tasks like zero-shot classification and retrieval rather than text generation or translation.

Changing the `pipeline_tag` will improve the model's discoverability on the Hugging Face Hub, allowing users to find it under the most relevant category for its primary use case.

Files changed (1) hide show
  1. README.md +10 -7
README.md CHANGED
@@ -1,15 +1,15 @@
1
  ---
2
- license: mit
 
3
  datasets:
4
  - TIGER-Lab/MMEB-train
5
  language:
6
  - en
 
 
7
  metrics:
8
  - recall
9
- base_model:
10
- - llava-hf/llava-onevision-qwen2-7b-ov-hf
11
- pipeline_tag: image-text-to-text
12
- library_name: transformers
13
  ---
14
 
15
  # Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
@@ -72,14 +72,17 @@ def appply_chat_template(image=None, text=None):
72
  "role": "user",
73
  "content": [
74
  {"type": "image", "image": image},
75
- {"type": "text", "text": "Summary above image in one word:\n"},
 
76
  ],
77
  }]
78
  elif text!= None:
79
  conversation_image = [{
80
  "role": "user",
81
  "content": [
82
- {"type": "text", "text": f"{text}\nSummary above sentence in one word:\n"},
 
 
83
  ],
84
  }]
85
  return conversation_image
 
1
  ---
2
+ base_model:
3
+ - llava-hf/llava-onevision-qwen2-7b-ov-hf
4
  datasets:
5
  - TIGER-Lab/MMEB-train
6
  language:
7
  - en
8
+ library_name: transformers
9
+ license: mit
10
  metrics:
11
  - recall
12
+ pipeline_tag: zero-shot-image-classification
 
 
 
13
  ---
14
 
15
  # Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
 
72
  "role": "user",
73
  "content": [
74
  {"type": "image", "image": image},
75
+ {"type": "text", "text": "Summary above image in one word:
76
+ "},
77
  ],
78
  }]
79
  elif text!= None:
80
  conversation_image = [{
81
  "role": "user",
82
  "content": [
83
+ {"type": "text", "text": f"{text}
84
+ Summary above sentence in one word:
85
+ "},
86
  ],
87
  }]
88
  return conversation_image