Update README.md

a43b93e verified 4 days ago

22.4 kB

	---
	license: apache-2.0
	base_model:
	- lightonai/GTE-ModernColBERT-v1
	pipeline_tag: sentence-similarity
	tags:
	- SMVE
	- ColBERT
	- PyLate
	- sentence-transformers
	- sentence-similarity
	- feature-extraction
	datasets:
	- lightonai/ms-marco-en-bge-gemma
	language:
	- en
	---

	<p align="center">
	<svg width="300" height="84" viewBox="0 0 2000 560" fill="none" xmlns="http://www.w3.org/2000/svg">
	<rect width="100" height="100" fill="#EDEDED"/>
	<rect x="115" width="100" height="100" fill="#EDEDED"/>
	<rect x="230" width="100" height="100" fill="#EDEDED"/>
	<rect x="345" width="100" height="100" fill="#EDEDED"/>
	<rect x="460" width="100" height="100" fill="#EDEDED"/>
	<rect x="230" y="115" width="100" height="100" fill="#EDEDED"/>
	<rect x="230" y="230" width="100" height="100" fill="#EDEDED"/>
	<rect x="230" y="345" width="100" height="100" fill="#EDEDED"/>
	<rect x="230" y="460" width="100" height="100" fill="#EDEDED"/>
	<rect x="1075" width="100" height="100" fill="#EDEDED"/>
	<rect x="1075" width="100" height="100" fill="#EDEDED"/>
	<rect x="1190" width="100" height="100" fill="#EDEDED"/>
	<rect x="1305" width="100" height="100" fill="#EDEDED"/>
	<rect x="1190" y="230" width="100" height="100" fill="#EDEDED"/>
	<rect x="1305" y="230" width="100" height="100" fill="#EDEDED"/>
	<rect x="1420" width="100" height="100" fill="#EDEDED"/>
	<rect x="1420" width="100" height="100" fill="#EDEDED"/>
	<rect x="1075" y="460" width="100" height="100" fill="#EDEDED"/>
	<rect x="1075" y="460" width="100" height="100" fill="#EDEDED"/>
	<rect x="1075" y="115" width="100" height="100" fill="#EDEDED"/>
	<rect x="1075" y="230" width="100" height="100" fill="#EDEDED"/>
	<rect x="1420" width="100" height="100" fill="#EDEDED"/>
	<rect x="1420" width="100" height="100" fill="#EDEDED"/>
	<rect x="1420" y="115" width="100" height="100" fill="#EDEDED"/>
	<rect x="1420" y="230" width="100" height="100" fill="#EDEDED"/>
	<rect x="1075" y="345" width="100" height="100" fill="#EDEDED"/>
	<rect x="1075" y="460" width="100" height="100" fill="#EDEDED"/>
	<rect x="710" width="100" height="100" fill="#EDEDED"/>
	<rect x="825" width="100" height="100" fill="#EDEDED"/>
	<rect x="940" width="100" height="100" fill="#EDEDED"/>
	<rect x="595" width="100" height="100" fill="#EDEDED"/>
	<rect x="595" y="115" width="100" height="100" fill="#EDEDED"/>
	<rect x="595" y="230" width="100" height="100" fill="#EDEDED"/>
	<rect x="595" y="345" width="100" height="100" fill="#EDEDED"/>
	<rect x="595" y="460" width="100" height="100" fill="#EDEDED"/>
	<rect x="940" width="100" height="100" fill="#EDEDED"/>
	<rect x="710" y="460" width="100" height="100" fill="#EDEDED"/>
	<rect x="825" y="460" width="100" height="100" fill="#EDEDED"/>
	<rect x="940" y="460" width="100" height="100" fill="#EDEDED"/>
	<rect x="595" y="460" width="100" height="100" fill="#EDEDED"/>
	<rect x="940" y="460" width="100" height="100" fill="#EDEDED"/>
	<rect x="940" y="115" width="100" height="100" fill="#EDEDED"/>
	<rect x="940" y="230" width="100" height="100" fill="#EDEDED"/>
	<rect x="940" y="345" width="100" height="100" fill="#EDEDED"/>
	<rect x="940" y="460" width="100" height="100" fill="#EDEDED"/>
	<rect x="940" width="100" height="100" fill="#EDEDED"/>
	<rect x="940" width="100" height="100" fill="#EDEDED"/>
	<rect x="940" y="460" width="100" height="100" fill="#EDEDED"/>
	<rect x="940" y="460" width="100" height="100" fill="#EDEDED"/>
	<rect x="940" y="115" width="100" height="100" fill="#EDEDED"/>
	<rect x="940" y="230" width="100" height="100" fill="#EDEDED"/>
	<rect x="940" y="345" width="100" height="100" fill="#EDEDED"/>
	<rect x="940" y="460" width="100" height="100" fill="#EDEDED"/>
	<rect x="1555" width="100" height="100" fill="#FE5000"/>
	<rect x="1555" y="115" width="100" height="100" fill="#FE5000"/>
	<rect x="1555" y="230" width="100" height="100" fill="#FE5000"/>
	<rect x="1785" y="115" width="100" height="100" fill="#FE5000"/>
	<rect x="1670" y="230" width="100" height="100" fill="#FE5000"/>
	<rect x="1900" width="100" height="100" fill="#FE5000"/>
	<rect x="1900" width="100" height="100" fill="#FE5000"/>
	<rect x="1785" y="345" width="100" height="100" fill="#FE5000"/>
	<rect x="1900" y="460" width="100" height="100" fill="#FE5000"/>
	<rect x="1555" y="345" width="100" height="100" fill="#FE5000"/>
	<rect x="1555" y="460" width="100" height="100" fill="#FE5000"/>
	<rect x="1900" y="460" width="100" height="100" fill="#FE5000"/>
	</svg>
	</p>

	<p align="center">
	<sup>Looking for production ready multi-vector search? Check out <a href="https://topk.io">TopK</a>, hybrid retrieval engine build on object storage.</sup>
	</p>

	# Iso-ModernColBERT
	This model is an isotropically corrected version of [GTE-ModernColBERT-v1](https://huggingface.co/lightonai/GTE-ModernColBERT-v1).
	It's built for production use cases where retrieval speed and quality matter. Compared to the original model, this version delivers
	up to 3x faster inference in `bf16` with almost no loss in accuracy and enables scalable multi-vector retrieval through
	[Sparse Multi-Vector Encoding (SMVE)](https://www.topk.io/blog/20260311-smve-multi-vector-retrieval) inside [TopK](https://topk.io).

	## Usage
	Install PyLate for embeddings and TopK SDK for retrieval.
	```
	pip install -U pylate topk-sdk
	```

	### Embed documents
	First, load the model into PyLate `ColBERT` class and encode your documents.

	```python
	import torch
	import numpy as np
	from pylate import models


	model = models.ColBERT(
	model_name_or_path="topk-io/Iso-ModernColBERT",
	model_kwargs={'torch_dtype': torch.bfloat16},
	)

	documents = [
	"document 1 text",
	"document 2 text",
	"document 3 text",
	]

	doc_embeddings = model.encode(
	documents,
	batch_size=32,
	# Ensure that it is set to False to indicate that these are documents, not queries
	is_query=False,
	show_progress_bar=True,
	)
	```

	### Store document embeddings
	Index multi-vector document embeddings inside [TopK](https://topk.io), hybrid retrieval engine built on object storage.
	To get started, [create an API key](https://console.topk.io).
	```python
	from topk_sdk import Client
	from topk_sdk.schema import matrix, multi_vector_index

	# Initialize TopK client
	client = Client(
	api_key = "<TOPK_API_KEY>",
	region = "aws-us-east-1-elastica",
	)

	# Create a collection with multi-vector index
	client.collections().create(
	"iso-moderncolbert",
	schema = {
	"token_embeddings": matrix(dimension=128, value_type="f16")
	.index(multi_vector_index(metric="maxsim"))
	}
	)

	# Upsert document embeddings
	client.collection("iso-moderncolbert").upsert([
	{
	"_id": str(i),
	"token_embeddings": emb.astype(np.float16),
	"text": text
	}
	for (i, (text, emb)) in enumerate(zip(documents, doc_embeddings))
	])
	```

	### Retrieve documents for queries
	Your documents are now durably persisted in the index and queryable.
	```python
	from topk_sdk.query import fn, select, field

	# Encode query string
	query_embedding = model.encode(
	"query for document 3",
	# Ensure that it is set to True for queries
	is_query=True,
	show_progress_bar=False,
	)

	# Retrieve top-k documents using the query embedding
	results = client.collection("iso-moderncolbert").query(
	select(
	"_id", "text",
	# Compute maxsim between query and indexed documents
	maxsim_score = fn.multi_vector_distance(
	"token_embeddings",
	query_embedding.astype(np.float16)
	)
	)
	# Get the top 10 matching documents
	.topk(field("maxsim_score"), 10)
	)

	for r in results:
	print(f"id: {r['_id']}, score: {r['maxsim_score']}, text: {r['text']}")
	```
	TopK's query language is flexible and allows you to tune retrieval parameters, combine multi-vector with metadata filters,
	keyword search, and more. Check out our [docs](https://docs.topk.io) to learn more.

	# Evaluation results

	We conducted evaluation of our model using an internal evaluation harness on two standard benchmarks - BEIR and NanoBEIR.
	For baselines, we selected [GTE-ModernColBERT-v1](https://huggingface.co/lightonai/GTE-ModernColBERT-v1) and evaluated its perfomance in fp32 and bf16 precision (denoted by `GTE fp32` and `GTE bf16`, respectively).
	The last two columns of each table — Iso bf16 and Δ vs GTE — describe Iso-ModernColBERT (ours) in bf16 precision.
	In all configurations we used the same SMVE implementation with width 65536 and k=32.

	## BEIR

	### NDCG@10 — ranking quality is robust to bf16

	End-to-end ranking quality reported as NDCG@10, using exact MaxSim scoring (no approximation). GTE-ModernColBERT-v1 loses ~7 NDCG points on average going from fp32 → bf16 — about a 13% relative drop — with the worst-hit datasets (trec-covid, climate-fever, hotpotqa) dropping 12–16 points. Iso-ModernColBERT keeps fp32-level ranking quality in bf16, recovering most of that gap on average and on every dataset.

	\| dataset \| GTE fp32 N@10 \| GTE bf16 N@10 \| Iso bf16 N@10 \| Δ vs GTE bf16 \|
	\|---------------\|--------------:\|--------------:\|------------------:\|------------------:\|
	\| arguana \| 35.81% \| 30.35% \| 34.63% \| +14.10% \|
	\| climate-fever \| 32.44% \| 19.49% \| 31.62% \| +62.24% \|
	\| cqadupstack \| 40.54% \| 38.25% \| 40.64% \| +6.25% \|
	\| dbpedia \| 53.96% \| 48.43% \| 52.84% \| +9.11% \|
	\| fever \| 88.80% \| 80.67% \| 87.08% \| +7.95% \|
	\| fiqa \| 45.56% \| 37.15% \| 43.48% \| +17.04% \|
	\| hotpotqa \| 78.36% \| 66.74% \| 75.85% \| +13.65% \|
	\| msmarco \| 46.12% \| 41.82% \| 45.30% \| +8.32% \|
	\| nfcorpus \| 37.81% \| 35.98% \| 37.31% \| +3.70% \|
	\| nq \| 62.24% \| 52.60% \| 60.45% \| +14.92% \|
	\| quora \| 86.63% \| 79.58% \| 85.05% \| +6.87% \|
	\| scidocs \| 19.49% \| 17.82% \| 18.81% \| +5.56% \|
	\| scifact \| 75.98% \| 71.55% \| 75.26% \| +5.18% \|
	\| touche2020 \| 31.30% \| 22.93% \| 29.45% \| +28.43% \|
	\| trec-covid \| 89.30% \| 73.47% \| 83.76% \| +14.01% \|
	\| avg \| 54.96% \| 47.79% \| 53.44% \| +11.82% \|

	### Recall@100 — SMVE as a first stage with ~10× overfetch
	The following results show model performance when used with [Sparse Multi-Vector Encoder (SMVE)](https://www.topk.io/blog/20260311-smve-multi-vector-retrieval) as a first stage retriever.

	For a SMVE first stage to be usable, it needs to surface the candidates that the exact fp32 MaxSim model would have ranked at the top. SMVE on GTE-ModernColBERT-v1 is broken — its compacted latent geometry means random anchors don't separate vectors well. Iso-ModernColBERT's SMVE recovers (and often exceeds) the fp32 MaxSim top-10 within 10× overfetch.

	\| dataset \| GTE fp32 MaxSim R@10 \| GTE fp32 SMVE R@100 \| Iso bf16 SMVE R@100 \| Δ vs GTE fp32 SMVE \|
	\|---------------\|---------------------:\|--------------------:\|------------------------:\|-----------------------:\|
	\| arguana \| 72.81% \| 27.69% \| 84.51% \| +205.20% \|
	\| climate-fever \| 39.27% \| 0.41% \| 48.84% \| +11,812% ⚠ \|
	\| cqadupstack \| 50.48% \| 11.78% \| 37.29% \| +216.55% \|
	\| dbpedia \| 30.45% \| 8.54% \| 36.89% \| +331.97% \|
	\| fever \| 94.20% \| 10.05% \| 94.31% \| +838.41% \|
	\| fiqa \| 52.15% \| 6.45% \| 49.12% \| +661.55% \|
	\| hotpotqa \| 80.73% \| 12.29% \| 66.59% \| +441.82% \|
	\| msmarco \| 68.64% \| 27.77% \| 75.83% \| +173.07% \|
	\| nfcorpus \| 18.03% \| 16.63% \| 25.60% \| +53.94% \|
	\| nq \| 82.03% \| 14.60% \| 78.85% \| +440.07% \|
	\| quora \| 94.92% \| 43.73% \| 82.86% \| +89.48% \|
	\| scidocs \| 20.36% \| 12.29% \| 29.32% \| +138.57% \|
	\| scifact \| 87.39% \| 60.93% \| 90.00% \| +47.71% \|
	\| touche2020 \| 19.69% \| 4.47% \| 40.17% \| +798.66% \|
	\| trec-covid \| 2.27% \| 0.89% \| 7.73% \| +768.54% \|
	\| avg \| 54.23% \| 17.23% \| 56.53% \| +228.09% \|

	> ⚠ The +11,812% on climate-fever is an artifact of a near-zero baseline (0.41%): GTE's SMVE is so broken on that dataset that the ratio explodes. Read it as "GTE SMVE doesn't work here at all", not as a meaningful magnitude.

	### Recall@1000 — SMVE as a first stage with ~10× overfetch (deeper pool)

	Same picture at the next pool depth: Iso-ModernColBERT SMVE R@1000 essentially matches or exceeds fp32 MaxSim R@100 across the board, while GTE's SMVE collapses.

	\| dataset \| GTE fp32 MaxSim R@100 \| GTE fp32 SMVE R@1000 \| Iso bf16 SMVE R@1000 \| Δ vs GTE fp32 SMVE \|
	\|---------------\|----------------------:\|---------------------:\|-------------------------:\|-----------------------:\|
	\| arguana \| 95.72% \| 68.31% \| 97.00% \| +42.00% \|
	\| climate-fever \| 66.45% \| 0.93% \| 68.87% \| +7,305% ⚠ \|
	\| cqadupstack \| 71.44% \| 26.78% \| 55.78% \| +108.29% \|
	\| dbpedia \| 62.50% \| 18.35% \| 57.72% \| +214.55% \|
	\| fever \| 97.46% \| 16.74% \| 96.91% \| +478.91% \|
	\| fiqa \| 75.64% \| 21.09% \| 76.70% \| +263.68% \|
	\| hotpotqa \| 90.31% \| 22.72% \| 78.83% \| +247.05% \|
	\| msmarco \| 93.14% \| 46.57% \| 90.97% \| +95.34% \|
	\| nfcorpus \| 32.22% \| 49.11% \| 57.16% \| +16.39% \|
	\| nq \| 96.59% \| 29.88% \| 91.42% \| +205.96% \|
	\| quora \| 99.45% \| 69.38% \| 94.86% \| +36.72% \|
	\| scidocs \| 44.07% \| 32.62% \| 53.43% \| +63.80% \|
	\| scifact \| 96.00% \| 89.82% \| 99.33% \| +10.59% \|
	\| touche2020 \| 52.60% \| 13.91% \| 69.63% \| +400.58% \|
	\| trec-covid \| 16.02% \| 3.85% \| 29.57% \| +668.05% \|
	\| avg \| 72.64% \| 34.00% \| 74.55% \| +119.26% \|

	> ⚠ Again, climate-fever's +7,305% is driven by a near-zero baseline (0.93%) — GTE SMVE simply doesn't work on this dataset.

	## NanoBEIR

	### NDCG@10 — ranking quality is robust to bf16

	End-to-end ranking quality reported as NDCG@10, using exact MaxSim scoring (no approximation). GTE-ModernColBERT-v1 drops ~6 NDCG points on average going from fp32 → bf16 — about a 9% relative drop — with some datasets (ArguAna, ClimateFEVER, FiQA, Touche2020) losing 8–13 points. Iso-ModernColBERT keeps fp32-level ranking quality in bf16 — average is within 0.6 points of fp32, and most per-dataset gaps close to a few percent.

	\| dataset \| GTE fp32 N@10 \| GTE bf16 N@10 \| Iso bf16 N@10 \| Δ vs GTE bf16 \|
	\|----------------\|--------------:\|--------------:\|------------------:\|------------------:\|
	\| ArguAna \| 51.98% \| 43.50% \| 54.31% \| +24.85% \|
	\| ClimateFEVER \| 40.46% \| 27.78% \| 38.17% \| +37.40% \|
	\| DBPedia \| 72.82% \| 70.39% \| 71.56% \| +1.66% \|
	\| FEVER \| 94.52% \| 89.82% \| 93.23% \| +3.80% \|
	\| FiQA2018 \| 56.64% \| 44.13% \| 55.79% \| +26.42% \|
	\| HotpotQA \| 89.95% \| 85.64% \| 90.47% \| +5.64% \|
	\| MSMARCO \| 70.89% \| 68.77% \| 72.56% \| +5.51% \|
	\| NFCorpus \| 39.58% \| 39.20% \| 38.67% \| -1.35% \|
	\| NQ \| 77.19% \| 69.01% \| 73.64% \| +6.71% \|
	\| QuoraRetrieval \| 97.08% \| 90.60% \| 96.53% \| +6.54% \|
	\| SCIDOCS \| 39.85% \| 38.02% \| 38.14% \| +0.32% \|
	\| SciFact \| 82.98% \| 80.45% \| 83.32% \| +3.57% \|
	\| Touche2020 \| 59.34% \| 48.67% \| 58.77% \| +20.75% \|
	\| avg \| 67.18% \| 61.23% \| 66.55% \| +8.69% \|

	### Recall@100 — SMVE as a first stage with ~10× overfetch
	The following results show model performance when used with [Sparse Multi-Vector Encoder (SMVE)](https://www.topk.io/blog/20260311-smve-multi-vector-retrieval) as a first stage retriever.

	For a SMVE first stage to be usable, it needs to surface the candidates that the exact fp32 MaxSim model would have ranked at the top. SMVE on GTE-ModernColBERT-v1 is broken — its compacted latent geometry means random anchors don't separate vectors well. Iso-ModernColBERT's SMVE recovers (and often exceeds) fp32 MaxSim's top-10 within 10× overfetch.

	\| dataset \| GTE fp32 MaxSim R@10 \| GTE fp32 SMVE R@100 \| Iso bf16 SMVE R@100 \| Δ vs GTE fp32 SMVE \|
	\|----------------\|---------------------:\|--------------------:\|------------------------:\|-----------------------:\|
	\| ArguAna \| 80.00% \| 32.00% \| 90.00% \| +181.25% \|
	\| ClimateFEVER \| 47.07% \| 20.67% \| 66.97% \| +224.00% \|
	\| DBPedia \| 41.21% \| 49.00% \| 72.85% \| +48.67% \|
	\| FEVER \| 98.00% \| 61.33% \| 98.00% \| +59.79% \|
	\| FiQA2018 \| 64.12% \| 23.25% \| 78.93% \| +239.48% \|
	\| HotpotQA \| 92.00% \| 46.00% \| 90.00% \| +95.65% \|
	\| MSMARCO \| 92.00% \| 84.00% \| 98.00% \| +16.67% \|
	\| NFCorpus \| 15.66% \| 16.33% \| 24.58% \| +50.52% \|
	\| NQ \| 88.00% \| 70.00% \| 95.00% \| +35.71% \|
	\| QuoraRetrieval \| 98.93% \| 87.93% \| 96.60% \| +9.86% \|
	\| SCIDOCS \| 39.67% \| 37.87% \| 61.17% \| +61.53% \|
	\| SciFact \| 93.00% \| 57.50% \| 92.00% \| +60.00% \|
	\| Touche2020 \| 33.52% \| 33.55% \| 69.86% \| +108.23% \|
	\| avg \| 67.94% \| 47.65% \| 79.53% \| +66.91% \|

	### Recall@1000 — SMVE as a first stage with ~10× overfetch (deeper pool)

	Same picture at the next pool depth: Iso-ModernColBERT SMVE R@1000 essentially matches or exceeds fp32 MaxSim R@100 across the board, while GTE's SMVE consistently undershoots.

	\| dataset \| GTE fp32 MaxSim R@100 \| GTE fp32 SMVE R@1000 \| Iso bf16 SMVE R@1000 \| Δ vs GTE fp32 SMVE \|
	\|----------------\|----------------------:\|---------------------:\|-------------------------:\|-----------------------:\|
	\| ArguAna \| 96.00% \| 80.00% \| 100.00% \| +25.00% \|
	\| ClimateFEVER \| 81.17% \| 68.80% \| 89.03% \| +29.40% \|
	\| DBPedia \| 85.58% \| 84.85% \| 96.20% \| +13.38% \|
	\| FEVER \| 100.00% \| 94.33% \| 99.00% \| +4.95% \|
	\| FiQA2018 \| 86.82% \| 72.61% \| 91.35% \| +25.81% \|
	\| HotpotQA \| 97.00% \| 84.00% \| 98.00% \| +16.67% \|
	\| MSMARCO \| 100.00% \| 98.00% \| 100.00% \| +2.04% \|
	\| NFCorpus \| 30.55% \| 52.82% \| 59.33% \| +12.32% \|
	\| NQ \| 100.00% \| 91.00% \| 100.00% \| +9.89% \|
	\| QuoraRetrieval \| 100.00% \| 96.00% \| 100.00% \| +4.17% \|
	\| SCIDOCS \| 70.67% \| 78.93% \| 90.80% \| +15.04% \|
	\| SciFact \| 96.00% \| 93.00% \| 100.00% \| +7.53% \|
	\| Touche2020 \| 77.23% \| 80.46% \| 93.09% \| +15.70% \|
	\| avg \| 86.23% \| 82.68% \| 93.60% \| +13.21% \|