frozenc commited on
Commit
e9046ac
·
verified ·
1 Parent(s): 4894b7d
Files changed (1) hide show
  1. README.md +16 -16
README.md CHANGED
@@ -12,11 +12,11 @@ tags:
12
  - multilingual-embedding
13
  - colqwen3
14
  ---
15
- # OpenSearch-AI/Ops-ColQwen3-4B
16
 
17
- **Ops-ColQwen3-4B** is a ColPali-style multimodal embedding model based on the **Qwen3-VL-4B-Instruct** architecture, developed and open-sourced by the Alibaba Cloud OpenSearch-AI team. It maps text queries and visual documents such as images and PDF pages into a unified, aligned **multi-vector embedding space**, enabling highly effective retrieval of visual documents.
18
 
19
- The model is trained using a multi-stage strategy that combines large-scale text-based retrieval datasets with diverse visual document data. This hybrid training approach significantly enhances its capability to handle complex document understanding and retrieval tasks. On the Vidore v1–v3 benchmarks, **Ops-ColQwen3-4B** achieves **state-of-the-art results** among models of comparable size.
20
 
21
  ## Key Features
22
 
@@ -70,11 +70,11 @@ print(f"Scores:\n{scores}")
70
 
71
  | Model | Dim | Vidore v1+v2 | Vidore v2 | Vidore v1 |
72
  |--------------------------------------------|------|--------------|-----------|-----------|
73
- | **Ops-ColQwen3-4B** | 2560 | **84.87** | **68.7** | **91.4** |
74
- | **Ops-ColQwen3-4B** | 1280 | 84.71 | 68.2 | 91.3 |
75
- | **Ops-ColQwen3-4B** | 640 | 84.39 | 67.7 | 91.1 |
76
- | **Ops-ColQwen3-4B** | 320 | 84.12 | 67.0 | 91.0 |
77
- | **Ops-ColQwen3-4B** | 128 | 84.04 | 66.9 | 90.9 |
78
  | tomoro-colqwen3-embed-8b | 320 | 83.52 | 65.4 | 90.8 |
79
  | EvoQwen2.5-VL-Retriever-7B-v1 | 128 | 83.41 | 65.2 | 90.7 |
80
  | tomoro-colqwen3-embed-4b | 320 | 83.18 | 64.7 | 90.6 |
@@ -90,11 +90,11 @@ print(f"Scores:\n{scores}")
90
 
91
  | Model | Dim | PUB AVG |
92
  |--------------------------------------------|------|---------|
93
- | **Ops-ColQwen3-4B** | 2560 | 61.27 |
94
- | **Ops-ColQwen3-4B** | 1280 | **61.32** |
95
- | **Ops-ColQwen3-4B** | 640 | 61.21 |
96
- | **Ops-ColQwen3-4B** | 320 | 60.88 |
97
- | **Ops-ColQwen3-4B** | 128 | 60.23 |
98
  | tomoro-colqwen3-embed-4b | 320 | 60.19 |
99
  | SauerkrautLM-ColQwen3-8b-v0.1 | 128 | 58.55 |
100
  | jina-embedding-v4 | 128 | 57.54 |
@@ -102,7 +102,7 @@ print(f"Scores:\n{scores}")
102
  | SauerkrautLM-ColQwen3-4b-v0.1 | 128 | 56.03 |
103
 
104
 
105
- > With only **128 dimensions**, `Ops-ColQwen3-4B` outperforms other 4B-parameter models such as `tomoro-colqwen3-embed-4b`, making it well-suited for latency- and memory-constrained applications.
106
 
107
 
108
  ## Citation
@@ -112,8 +112,8 @@ If you use this model in your work, please cite:
112
  ```bibtex
113
  @misc{ops_colqwen3_4b,
114
  author = {{OpenSearch-AI}},
115
- title = {{Ops-ColQwen3: State-of-the-Art Multimodal Embedding Model for Visual Document Retrieval}},
116
  year = {2026},
117
- howpublished = {\url{https://huggingface.co/OpenSearch-AI/Ops-ColQwen3-4B}},
118
  }
119
  ```
 
12
  - multilingual-embedding
13
  - colqwen3
14
  ---
15
+ # OpenSearch-AI/Ops-Colqwen3-4B
16
 
17
+ **Ops-Colqwen3-4B** is a ColPali-style multimodal embedding model based on the **Qwen3-VL-4B-Instruct** architecture, developed and open-sourced by the Alibaba Cloud OpenSearch-AI team. It maps text queries and visual documents such as images and PDF pages into a unified, aligned **multi-vector embedding space**, enabling highly effective retrieval of visual documents.
18
 
19
+ The model is trained using a multi-stage strategy that combines large-scale text-based retrieval datasets with diverse visual document data. This hybrid training approach significantly enhances its capability to handle complex document understanding and retrieval tasks. On the Vidore v1–v3 benchmarks, **Ops-Colqwen3-4B** achieves **state-of-the-art results** among models of comparable size.
20
 
21
  ## Key Features
22
 
 
70
 
71
  | Model | Dim | Vidore v1+v2 | Vidore v2 | Vidore v1 |
72
  |--------------------------------------------|------|--------------|-----------|-----------|
73
+ | **Ops-Colqwen3-4B** | 2560 | **84.87** | **68.7** | **91.4** |
74
+ | **Ops-Colqwen3-4B** | 1280 | 84.71 | 68.2 | 91.3 |
75
+ | **Ops-Colqwen3-4B** | 640 | 84.39 | 67.7 | 91.1 |
76
+ | **Ops-Colqwen3-4B** | 320 | 84.12 | 67.0 | 91.0 |
77
+ | **Ops-Colqwen3-4B** | 128 | 84.04 | 66.9 | 90.9 |
78
  | tomoro-colqwen3-embed-8b | 320 | 83.52 | 65.4 | 90.8 |
79
  | EvoQwen2.5-VL-Retriever-7B-v1 | 128 | 83.41 | 65.2 | 90.7 |
80
  | tomoro-colqwen3-embed-4b | 320 | 83.18 | 64.7 | 90.6 |
 
90
 
91
  | Model | Dim | PUB AVG |
92
  |--------------------------------------------|------|---------|
93
+ | **Ops-Colqwen3-4B** | 2560 | 61.27 |
94
+ | **Ops-Colqwen3-4B** | 1280 | **61.32** |
95
+ | **Ops-Colqwen3-4B** | 640 | 61.21 |
96
+ | **Ops-Colqwen3-4B** | 320 | 60.88 |
97
+ | **Ops-Colqwen3-4B** | 128 | 60.23 |
98
  | tomoro-colqwen3-embed-4b | 320 | 60.19 |
99
  | SauerkrautLM-ColQwen3-8b-v0.1 | 128 | 58.55 |
100
  | jina-embedding-v4 | 128 | 57.54 |
 
102
  | SauerkrautLM-ColQwen3-4b-v0.1 | 128 | 56.03 |
103
 
104
 
105
+ > With only **128 dimensions**, `Ops-Colqwen3-4B` outperforms other 4B-parameter models such as `tomoro-colqwen3-embed-4b`, making it well-suited for latency- and memory-constrained applications.
106
 
107
 
108
  ## Citation
 
112
  ```bibtex
113
  @misc{ops_colqwen3_4b,
114
  author = {{OpenSearch-AI}},
115
+ title = {{Ops-Colqwen3: State-of-the-Art Multimodal Embedding Model for Visual Document Retrieval}},
116
  year = {2026},
117
+ howpublished = {\url{https://huggingface.co/OpenSearch-AI/Ops-Colqwen3-4B}},
118
  }
119
  ```