Add pipeline tag and improve model card
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,13 +1,28 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
| 3 |
---
|
| 4 |
-
## ObjEmbed: Towards Universal Multimodal Object Embeddings
|
| 5 |
|
| 6 |
-
|
| 7 |
|
| 8 |
-
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
@article{fu2026objembed,
|
| 12 |
title={ObjEmbed: Towards Universal Multimodal Object Embeddings},
|
| 13 |
author={Fu, Shenghao and Su, Yukun and Rao, Fengyun and LYU, Jing and Xie, Xiaohua and Zheng, Wei-Shi},
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
pipeline_tag: image-feature-extraction
|
| 4 |
---
|
|
|
|
| 5 |
|
| 6 |
+
# ObjEmbed: Towards Universal Multimodal Object Embeddings
|
| 7 |
|
| 8 |
+
[ObjEmbed](https://arxiv.org/abs/2602.01753) is a multimodal embedding model that decomposes an input image into multiple regional embeddings, each corresponding to an individual object, along with global embeddings. It is designed to bridge the gap between global image-text alignment and fine-grained region-phrase alignment.
|
| 9 |
|
| 10 |
+
- **Paper:** [ObjEmbed: Towards Universal Multimodal Object Embeddings](https://arxiv.org/abs/2602.01753)
|
| 11 |
+
- **Code:** [GitHub - WeChatCV/ObjEmbed](https://github.com/WeChatCV/ObjEmbed)
|
| 12 |
+
|
| 13 |
+
## Key Features
|
| 14 |
+
|
| 15 |
+
ObjEmbed enjoys three key properties:
|
| 16 |
+
|
| 17 |
+
- **Object-Oriented Representation**: It captures both semantic and spatial aspects of objects by generating two complementary embeddings for each region: an object embedding for semantic matching and an IoU embedding that predicts localization quality.
|
| 18 |
+
- **Versatility**: It seamlessly handles both region-level tasks (like visual grounding and local image retrieval) and image-level tasks (global image retrieval).
|
| 19 |
+
- **Efficient Encoding**: All objects in an image, along with the full image, are encoded in a single forward pass for high efficiency.
|
| 20 |
+
|
| 21 |
+
## Citation
|
| 22 |
+
|
| 23 |
+
If you find our work helpful for your research, please consider citing our paper:
|
| 24 |
+
|
| 25 |
+
```bibtex
|
| 26 |
@article{fu2026objembed,
|
| 27 |
title={ObjEmbed: Towards Universal Multimodal Object Embeddings},
|
| 28 |
author={Fu, Shenghao and Su, Yukun and Rao, Fengyun and LYU, Jing and Xie, Xiaohua and Zheng, Wei-Shi},
|