OpenSearch-AI/Ops-Colqwen3-4B

Ops-Colqwen3-4B is a ColPali-style multimodal embedding model based on the Qwen3-VL-4B-Instruct architecture, developed and open-sourced by the Alibaba Cloud OpenSearch-AI team. It maps text queries and visual documents such as images and PDF pages into a unified, aligned multi-vector embedding space, enabling highly effective retrieval of visual documents.

The model is trained using a multi-stage strategy that combines large-scale text-based retrieval datasets with diverse visual document data. This hybrid training approach significantly enhances its capability to handle complex document understanding and retrieval tasks. On the Vidore v1–v3 benchmarks, Ops-Colqwen3-4B achieves state-of-the-art results among models of comparable size.

Key Features

Model size: 4 billion parameters
Multimodal alignment: Enables fine-grained semantic alignment between text and images or PDF pages
Multi-vector embeddings: Following the ColPali design, each input generates multiple context-aware embedding vectors; similarity is computed using MaxSim, enabling high-precision matching
Scalable embedding dimensions: Supports embedding dimensions up to 2,560 during inference via an extended projection head, enabling higher retrieval accuracy through more expressive representations. Lower-dimensional prefixes (e.g., the first 128 or 320 dimensions) remain highly effective for lightweight applications.
Multilingual support: Covers over 30 languages
Context length: Supports up to 32,000 tokens
Visual token capacity: Handles up to 1,280 visual tokens per page input.

Usage

Requirements

pillow
transformers>=4.57.0
qwen-vl-utils>=0.0.14
torch==2.8.0

Basic Usage

import torch
from PIL import Image
from scripts.ops_colqwen3_embedder import OpsColQwen3Embedder

images = [Image.new("RGB", (32, 32), color="white"), Image.new("RGB", (16, 16), color="black")]
queries = ["Is attention really all you need?", "What is the amount of bananas farmed in Salvador?"]

embedder = OpsColQwen3Embedder(
    model_name="OpenSearch-AI/Ops-Colqwen3-4B",
    dims=2560,
    dtype=torch.float16,
    attn_implementation="flash_attention_2",
)

query_embeddings = embedder.encode_queries(queries)
image_embeddings = embedder.encode_images(images)
print(query_embeddings[0].shape, image_embeddings[0].shape) # (23, 2560) (18, 2560)

scores = embedder.compute_scores(query_embeddings, image_embeddings)

print(f"Scores:\n{scores}")

Model Performance

Vidore v1 + v2 (NDCG@5)

Model	Dim	Vidore v1+v2	Vidore v2	Vidore v1
Ops-Colqwen3-4B	2560	84.87	68.7	91.4
Ops-Colqwen3-4B	1280	84.71	68.2	91.3
Ops-Colqwen3-4B	640	84.39	67.7	91.1
Ops-Colqwen3-4B	320	84.12	67.0	91.0
Ops-Colqwen3-4B	128	84.04	66.9	90.9
tomoro-colqwen3-embed-8b	320	83.52	65.4	90.8
EvoQwen2.5-VL-Retriever-7B-v1	128	83.41	65.2	90.7
tomoro-colqwen3-embed-4b	320	83.18	64.7	90.6
llama-nemoretriever-colembed-3b-v1	3072	83.10	63.3	91.0
SauerkrautLM-ColQwen3-8b-v0.1	128	82.91	62.5	91.1
EvoQwen2.5-VL-Retriever-3B-v1	128	82.76	63.0	90.7
SauerkrautLM-ColQwen3-4b-v0.1	128	81.97	59.9	90.8
jina-embedding-v4	128	81.17	58.2	90.4

Vidore v3 (NDCG@10)

Model	Dim	PUB AVG
Ops-Colqwen3-4B	2560	61.27
Ops-Colqwen3-4B	1280	61.32
Ops-Colqwen3-4B	640	61.21
Ops-Colqwen3-4B	320	60.88
Ops-Colqwen3-4B	128	60.23
tomoro-colqwen3-embed-4b	320	60.19
SauerkrautLM-ColQwen3-8b-v0.1	128	58.55
jina-embedding-v4	128	57.54
llama-nemoretriever-colembed-3b-v1	3072	57.07
SauerkrautLM-ColQwen3-4b-v0.1	128	56.03

With only 128 dimensions, Ops-Colqwen3-4B outperforms other 4B-parameter models such as tomoro-colqwen3-embed-4b, making it well-suited for latency- and memory-constrained applications.

Citation

If you use this model in your work, please cite:

@misc{ops_colqwen3_4b,
  author       = {{OpenSearch-AI}},
  title        = {{Ops-Colqwen3: State-of-the-Art Multimodal Embedding Model for Visual Document Retrieval}},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/OpenSearch-AI/Ops-Colqwen3-4B}},
}

Downloads last month: 53,084

Safetensors

Model size

4B params

Tensor type

BF16

Inference Providers NEW

Visual Document Retrieval

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OpenSearch-AI/Ops-Colqwen3-4B

Base model

Qwen/Qwen3-VL-4B-Instruct

Finetuned

(320)

this model

OpenSearch-AI
/

Ops-Colqwen3-4B