Fix ColPali claim; add single-vector + storage highlights
Browse files
README.md
CHANGED
|
@@ -51,6 +51,12 @@ model-index:
|
|
| 51 |
|
| 52 |
NanoVDR-L is a 151M-parameter text-only query encoder for visual document retrieval, trained via asymmetric cross-modal distillation from [Qwen3-VL-Embedding-2B](https://huggingface.co/Qwen/Qwen3-VL-Embedding-2B). It uses ModernBERT-base + a 2-layer MLP projector and achieves the highest v1 score (82.4) among all NanoVDR variants.
|
| 53 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
## Results
|
| 55 |
|
| 56 |
| Model | Params | ViDoRe v1 | ViDoRe v2 | ViDoRe v3 | Avg Retention |
|
|
|
|
| 51 |
|
| 52 |
NanoVDR-L is a 151M-parameter text-only query encoder for visual document retrieval, trained via asymmetric cross-modal distillation from [Qwen3-VL-Embedding-2B](https://huggingface.co/Qwen/Qwen3-VL-Embedding-2B). It uses ModernBERT-base + a 2-layer MLP projector and achieves the highest v1 score (82.4) among all NanoVDR variants.
|
| 53 |
|
| 54 |
+
### Highlights
|
| 55 |
+
|
| 56 |
+
- **Single-vector retrieval** — queries and documents share the same 2048-dim embedding space as [Qwen3-VL-Embedding-2B](https://huggingface.co/Qwen/Qwen3-VL-Embedding-2B); retrieval is a plain dot product, FAISS-compatible, **4 KB per page** (float16)
|
| 57 |
+
- **Lightweight on storage** — 612 MB model; doc index costs 64× less than ColPali's multi-vector patches
|
| 58 |
+
- **Asymmetric setup** — tiny 151M text encoder at query time; large VLM indexes documents offline once
|
| 59 |
+
|
| 60 |
## Results
|
| 61 |
|
| 62 |
| Model | Params | ViDoRe v1 | ViDoRe v2 | ViDoRe v3 | Avg Retention |
|