README / README.md
Ryenhails's picture
Upload README.md with huggingface_hub
368855a verified
metadata
title: NanoVDR
emoji: πŸ”
colorFrom: blue
colorTo: indigo
sdk: static
pinned: false

NanoVDR

Distilling a 2B Vision-Language Retriever into a 70M Text-Only Encoder
for Visual Document Retrieval

Demo  |  GitHub  |  Dataset

Paper: Our arxiv preprint is currently on hold. Details on training methodology, ablations, and full results will be available once the paper is published.


NanoVDR distills a frozen 2B VLM teacher (Qwen3-VL-Embedding-2B) into tiny text-only query encoders (69–151M parameters) for visual document retrieval. Documents are indexed offline by the teacher; queries are encoded on CPU in 51 ms via a DistilBERT forward pass β€” no vision model at query time.

Queries and documents both map to the same 2048-dim single vector inherited from the teacher's embedding space, so retrieval is a plain dot product β€” FAISS-compatible with no MaxSim pooling. The doc index stores just 4 KB per page (float16), making NanoVDR 64Γ— more storage-efficient than multi-vector retrievers like ColPali.

Models

Model Backbone Params ViDoRe v1 v2 v3 Retention CPU Latency
NanoVDR-S-Multi ⭐ DistilBERT 69M 82.2 61.9 46.5 95.1% 51 ms
NanoVDR-S DistilBERT 69M 82.2 60.5 43.5 92.4% 51 ms
NanoVDR-M BERT-base 112M 82.1 62.2 44.7 94.0% 101 ms
NanoVDR-L ModernBERT 151M 82.4 61.5 44.2 93.4% 109 ms

NDCG@5 (Γ—100) on the ViDoRe benchmark (22 datasets). Retention = Student / Teacher. Teacher = Qwen3-VL-Embedding-2B (2.0B).

Quick Start

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("nanovdr/NanoVDR-S-Multi")
query_emb = model.encode(["What was the revenue growth in Q3 2024?"])  # (1, 2048)

# Retrieve via cosine similarity against teacher-indexed document embeddings
# scores = query_emb @ doc_embeddings.T

Documents must be indexed offline with the teacher VLM. See the NanoVDR-S-Multi model page for a complete guide.

Acknowledgements

This project has received funding from the Business Finland co-innovation programme under grant agreement No. 69/31/2025. It is supported by the AiWo: Human-centric AI-enabled Collaborative Fieldwork Operations project (2025–2027), which aims to revolutionize fieldwork operations and enhance human-AI collaboration across the manufacturing, construction, and industrial design sectors. The calculations presented in this project were performed using computer resources within the Aalto University School of Science "Science-IT" project.