Ryenhails commited on
Commit
b5ce6d9
·
verified ·
1 Parent(s): 776ba4a

Fix ColPali claim; add single-vector + storage highlights

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -68,7 +68,9 @@ NanoVDR-S-Multi is a **69M-parameter multilingual text-only** query encoder for
68
 
69
  - **95.1% teacher retention** — a 69M text-only model recovers 95% of a 2B VLM teacher across 22 ViDoRe datasets
70
  - **Outperforms DSE-Qwen2 (2B)** on multilingual v2 (+6.2) and v3 (+4.1) with **32x fewer parameters**
71
- - **Outperforms ColPali (~3B)** on all three benchmarks with **single-vector cosine** retrieval (no MaxSim)
 
 
72
  - **51 ms CPU query latency** — 50x faster than DSE-Qwen2, 143x faster than ColPali
73
  - **6 languages**: English, German, French, Spanish, Italian, Portuguese — all >92% teacher retention
74
 
 
68
 
69
  - **95.1% teacher retention** — a 69M text-only model recovers 95% of a 2B VLM teacher across 22 ViDoRe datasets
70
  - **Outperforms DSE-Qwen2 (2B)** on multilingual v2 (+6.2) and v3 (+4.1) with **32x fewer parameters**
71
+ - **Outperforms ColPali (~3B)** on multilingual v2 (+7.2) and v3 (+4.5) with **single-vector cosine** retrieval (no MaxSim)
72
+ - **Single-vector retrieval** — queries and documents share the same 2048-dim embedding space as [Qwen3-VL-Embedding-2B](https://huggingface.co/Qwen/Qwen3-VL-Embedding-2B); retrieval is a plain dot product, FAISS-compatible, **4 KB per page** (float16)
73
+ - **Lightweight on storage** — 282 MB model file; doc index costs 64× less than ColPali's multi-vector patches
74
  - **51 ms CPU query latency** — 50x faster than DSE-Qwen2, 143x faster than ColPali
75
  - **6 languages**: English, German, French, Spanish, Italian, Portuguese — all >92% teacher retention
76