Iso-ModernColBERT / README.md
marekgalovic's picture
Update README.md
a43b93e verified
---
license: apache-2.0
base_model:
- lightonai/GTE-ModernColBERT-v1
pipeline_tag: sentence-similarity
tags:
- SMVE
- ColBERT
- PyLate
- sentence-transformers
- sentence-similarity
- feature-extraction
datasets:
- lightonai/ms-marco-en-bge-gemma
language:
- en
---
<p align="center">
<svg width="300" height="84" viewBox="0 0 2000 560" fill="none" xmlns="http://www.w3.org/2000/svg">
<rect width="100" height="100" fill="#EDEDED"/>
<rect x="115" width="100" height="100" fill="#EDEDED"/>
<rect x="230" width="100" height="100" fill="#EDEDED"/>
<rect x="345" width="100" height="100" fill="#EDEDED"/>
<rect x="460" width="100" height="100" fill="#EDEDED"/>
<rect x="230" y="115" width="100" height="100" fill="#EDEDED"/>
<rect x="230" y="230" width="100" height="100" fill="#EDEDED"/>
<rect x="230" y="345" width="100" height="100" fill="#EDEDED"/>
<rect x="230" y="460" width="100" height="100" fill="#EDEDED"/>
<rect x="1075" width="100" height="100" fill="#EDEDED"/>
<rect x="1075" width="100" height="100" fill="#EDEDED"/>
<rect x="1190" width="100" height="100" fill="#EDEDED"/>
<rect x="1305" width="100" height="100" fill="#EDEDED"/>
<rect x="1190" y="230" width="100" height="100" fill="#EDEDED"/>
<rect x="1305" y="230" width="100" height="100" fill="#EDEDED"/>
<rect x="1420" width="100" height="100" fill="#EDEDED"/>
<rect x="1420" width="100" height="100" fill="#EDEDED"/>
<rect x="1075" y="460" width="100" height="100" fill="#EDEDED"/>
<rect x="1075" y="460" width="100" height="100" fill="#EDEDED"/>
<rect x="1075" y="115" width="100" height="100" fill="#EDEDED"/>
<rect x="1075" y="230" width="100" height="100" fill="#EDEDED"/>
<rect x="1420" width="100" height="100" fill="#EDEDED"/>
<rect x="1420" width="100" height="100" fill="#EDEDED"/>
<rect x="1420" y="115" width="100" height="100" fill="#EDEDED"/>
<rect x="1420" y="230" width="100" height="100" fill="#EDEDED"/>
<rect x="1075" y="345" width="100" height="100" fill="#EDEDED"/>
<rect x="1075" y="460" width="100" height="100" fill="#EDEDED"/>
<rect x="710" width="100" height="100" fill="#EDEDED"/>
<rect x="825" width="100" height="100" fill="#EDEDED"/>
<rect x="940" width="100" height="100" fill="#EDEDED"/>
<rect x="595" width="100" height="100" fill="#EDEDED"/>
<rect x="595" y="115" width="100" height="100" fill="#EDEDED"/>
<rect x="595" y="230" width="100" height="100" fill="#EDEDED"/>
<rect x="595" y="345" width="100" height="100" fill="#EDEDED"/>
<rect x="595" y="460" width="100" height="100" fill="#EDEDED"/>
<rect x="940" width="100" height="100" fill="#EDEDED"/>
<rect x="710" y="460" width="100" height="100" fill="#EDEDED"/>
<rect x="825" y="460" width="100" height="100" fill="#EDEDED"/>
<rect x="940" y="460" width="100" height="100" fill="#EDEDED"/>
<rect x="595" y="460" width="100" height="100" fill="#EDEDED"/>
<rect x="940" y="460" width="100" height="100" fill="#EDEDED"/>
<rect x="940" y="115" width="100" height="100" fill="#EDEDED"/>
<rect x="940" y="230" width="100" height="100" fill="#EDEDED"/>
<rect x="940" y="345" width="100" height="100" fill="#EDEDED"/>
<rect x="940" y="460" width="100" height="100" fill="#EDEDED"/>
<rect x="940" width="100" height="100" fill="#EDEDED"/>
<rect x="940" width="100" height="100" fill="#EDEDED"/>
<rect x="940" y="460" width="100" height="100" fill="#EDEDED"/>
<rect x="940" y="460" width="100" height="100" fill="#EDEDED"/>
<rect x="940" y="115" width="100" height="100" fill="#EDEDED"/>
<rect x="940" y="230" width="100" height="100" fill="#EDEDED"/>
<rect x="940" y="345" width="100" height="100" fill="#EDEDED"/>
<rect x="940" y="460" width="100" height="100" fill="#EDEDED"/>
<rect x="1555" width="100" height="100" fill="#FE5000"/>
<rect x="1555" y="115" width="100" height="100" fill="#FE5000"/>
<rect x="1555" y="230" width="100" height="100" fill="#FE5000"/>
<rect x="1785" y="115" width="100" height="100" fill="#FE5000"/>
<rect x="1670" y="230" width="100" height="100" fill="#FE5000"/>
<rect x="1900" width="100" height="100" fill="#FE5000"/>
<rect x="1900" width="100" height="100" fill="#FE5000"/>
<rect x="1785" y="345" width="100" height="100" fill="#FE5000"/>
<rect x="1900" y="460" width="100" height="100" fill="#FE5000"/>
<rect x="1555" y="345" width="100" height="100" fill="#FE5000"/>
<rect x="1555" y="460" width="100" height="100" fill="#FE5000"/>
<rect x="1900" y="460" width="100" height="100" fill="#FE5000"/>
</svg>
</p>
<p align="center">
<sup>Looking for production ready multi-vector search? Check out <a href="https://topk.io">TopK</a>, hybrid retrieval engine build on object storage.</sup>
</p>
# Iso-ModernColBERT
This model is an isotropically corrected version of [GTE-ModernColBERT-v1](https://huggingface.co/lightonai/GTE-ModernColBERT-v1).
It's built for production use cases where retrieval speed and quality matter. Compared to the original model, this version delivers
up to 3x faster inference in `bf16` with almost no loss in accuracy and enables scalable multi-vector retrieval through
[Sparse Multi-Vector Encoding (SMVE)](https://www.topk.io/blog/20260311-smve-multi-vector-retrieval) inside [TopK](https://topk.io).
## Usage
Install PyLate for embeddings and TopK SDK for retrieval.
```
pip install -U pylate topk-sdk
```
### Embed documents
First, load the model into PyLate `ColBERT` class and encode your documents.
```python
import torch
import numpy as np
from pylate import models
model = models.ColBERT(
model_name_or_path="topk-io/Iso-ModernColBERT",
model_kwargs={'torch_dtype': torch.bfloat16},
)
documents = [
"document 1 text",
"document 2 text",
"document 3 text",
]
doc_embeddings = model.encode(
documents,
batch_size=32,
# Ensure that it is set to False to indicate that these are documents, not queries
is_query=False,
show_progress_bar=True,
)
```
### Store document embeddings
Index multi-vector document embeddings inside [TopK](https://topk.io), hybrid retrieval engine built on object storage.
To get started, [create an API key](https://console.topk.io).
```python
from topk_sdk import Client
from topk_sdk.schema import matrix, multi_vector_index
# Initialize TopK client
client = Client(
api_key = "<TOPK_API_KEY>",
region = "aws-us-east-1-elastica",
)
# Create a collection with multi-vector index
client.collections().create(
"iso-moderncolbert",
schema = {
"token_embeddings": matrix(dimension=128, value_type="f16")
.index(multi_vector_index(metric="maxsim"))
}
)
# Upsert document embeddings
client.collection("iso-moderncolbert").upsert([
{
"_id": str(i),
"token_embeddings": emb.astype(np.float16),
"text": text
}
for (i, (text, emb)) in enumerate(zip(documents, doc_embeddings))
])
```
### Retrieve documents for queries
Your documents are now durably persisted in the index and queryable.
```python
from topk_sdk.query import fn, select, field
# Encode query string
query_embedding = model.encode(
"query for document 3",
# Ensure that it is set to True for queries
is_query=True,
show_progress_bar=False,
)
# Retrieve top-k documents using the query embedding
results = client.collection("iso-moderncolbert").query(
select(
"_id", "text",
# Compute maxsim between query and indexed documents
maxsim_score = fn.multi_vector_distance(
"token_embeddings",
query_embedding.astype(np.float16)
)
)
# Get the top 10 matching documents
.topk(field("maxsim_score"), 10)
)
for r in results:
print(f"id: {r['_id']}, score: {r['maxsim_score']}, text: {r['text']}")
```
TopK's query language is flexible and allows you to tune retrieval parameters, combine multi-vector with metadata filters,
keyword search, and more. Check out our [docs](https://docs.topk.io) to learn more.
# Evaluation results
We conducted evaluation of our model using an internal evaluation harness on two standard benchmarks - BEIR and NanoBEIR.
For baselines, we selected [GTE-ModernColBERT-v1](https://huggingface.co/lightonai/GTE-ModernColBERT-v1) and evaluated its perfomance in fp32 and bf16 precision (denoted by `GTE fp32` and `GTE bf16`, respectively).
The last two columns of each table β€” **Iso bf16** and **Ξ” vs GTE** β€” describe Iso-ModernColBERT (ours) in bf16 precision.
In all configurations we used the same SMVE implementation with width 65536 and k=32.
## BEIR
### NDCG@10 β€” ranking quality is robust to bf16
End-to-end ranking quality reported as NDCG@10, using **exact MaxSim** scoring (no approximation). GTE-ModernColBERT-v1 loses ~7 NDCG points on average going from fp32 β†’ bf16 β€” about a 13% relative drop β€” with the worst-hit datasets (trec-covid, climate-fever, hotpotqa) dropping 12–16 points. Iso-ModernColBERT keeps fp32-level ranking quality in bf16, recovering most of that gap on average and on every dataset.
| dataset | GTE fp32 N@10 | GTE bf16 N@10 | **Iso bf16 N@10** | **Ξ” vs GTE bf16** |
|---------------|--------------:|--------------:|------------------:|------------------:|
| arguana | 35.81% | 30.35% | **34.63%** | **+14.10%** |
| climate-fever | 32.44% | 19.49% | **31.62%** | **+62.24%** |
| cqadupstack | 40.54% | 38.25% | **40.64%** | **+6.25%** |
| dbpedia | 53.96% | 48.43% | **52.84%** | **+9.11%** |
| fever | 88.80% | 80.67% | **87.08%** | **+7.95%** |
| fiqa | 45.56% | 37.15% | **43.48%** | **+17.04%** |
| hotpotqa | 78.36% | 66.74% | **75.85%** | **+13.65%** |
| msmarco | 46.12% | 41.82% | **45.30%** | **+8.32%** |
| nfcorpus | 37.81% | 35.98% | **37.31%** | **+3.70%** |
| nq | 62.24% | 52.60% | **60.45%** | **+14.92%** |
| quora | 86.63% | 79.58% | **85.05%** | **+6.87%** |
| scidocs | 19.49% | 17.82% | **18.81%** | **+5.56%** |
| scifact | 75.98% | 71.55% | **75.26%** | **+5.18%** |
| touche2020 | 31.30% | 22.93% | **29.45%** | **+28.43%** |
| trec-covid | 89.30% | 73.47% | **83.76%** | **+14.01%** |
| **avg** | **54.96%** | **47.79%** | **53.44%** | **+11.82%** |
### Recall@100 β€” SMVE as a first stage with ~10Γ— overfetch
The following results show model performance when used with [Sparse Multi-Vector Encoder (SMVE)](https://www.topk.io/blog/20260311-smve-multi-vector-retrieval) as a first stage retriever.
For a SMVE first stage to be usable, it needs to surface the candidates that the exact fp32 MaxSim model would have ranked at the top. SMVE on GTE-ModernColBERT-v1 is broken β€” its compacted latent geometry means random anchors don't separate vectors well. Iso-ModernColBERT's SMVE recovers (and often exceeds) the fp32 MaxSim top-10 within 10Γ— overfetch.
| dataset | GTE fp32 MaxSim R@10 | GTE fp32 SMVE R@100 | **Iso bf16 SMVE R@100** | **Ξ” vs GTE fp32 SMVE** |
|---------------|---------------------:|--------------------:|------------------------:|-----------------------:|
| arguana | 72.81% | 27.69% | **84.51%** | **+205.20%** |
| climate-fever | 39.27% | 0.41% | **48.84%** | **+11,812%** ⚠ |
| cqadupstack | 50.48% | 11.78% | **37.29%** | **+216.55%** |
| dbpedia | 30.45% | 8.54% | **36.89%** | **+331.97%** |
| fever | 94.20% | 10.05% | **94.31%** | **+838.41%** |
| fiqa | 52.15% | 6.45% | **49.12%** | **+661.55%** |
| hotpotqa | 80.73% | 12.29% | **66.59%** | **+441.82%** |
| msmarco | 68.64% | 27.77% | **75.83%** | **+173.07%** |
| nfcorpus | 18.03% | 16.63% | **25.60%** | **+53.94%** |
| nq | 82.03% | 14.60% | **78.85%** | **+440.07%** |
| quora | 94.92% | 43.73% | **82.86%** | **+89.48%** |
| scidocs | 20.36% | 12.29% | **29.32%** | **+138.57%** |
| scifact | 87.39% | 60.93% | **90.00%** | **+47.71%** |
| touche2020 | 19.69% | 4.47% | **40.17%** | **+798.66%** |
| trec-covid | 2.27% | 0.89% | **7.73%** | **+768.54%** |
| **avg** | **54.23%** | **17.23%** | **56.53%** | **+228.09%** |
> ⚠ The +11,812% on climate-fever is an artifact of a near-zero baseline (0.41%): GTE's SMVE is so broken on that dataset that the ratio explodes. Read it as *"GTE SMVE doesn't work here at all"*, not as a meaningful magnitude.
### Recall@1000 β€” SMVE as a first stage with ~10Γ— overfetch (deeper pool)
Same picture at the next pool depth: Iso-ModernColBERT SMVE R@1000 essentially matches or exceeds fp32 MaxSim R@100 across the board, while GTE's SMVE collapses.
| dataset | GTE fp32 MaxSim R@100 | GTE fp32 SMVE R@1000 | **Iso bf16 SMVE R@1000** | **Ξ” vs GTE fp32 SMVE** |
|---------------|----------------------:|---------------------:|-------------------------:|-----------------------:|
| arguana | 95.72% | 68.31% | **97.00%** | **+42.00%** |
| climate-fever | 66.45% | 0.93% | **68.87%** | **+7,305%** ⚠ |
| cqadupstack | 71.44% | 26.78% | **55.78%** | **+108.29%** |
| dbpedia | 62.50% | 18.35% | **57.72%** | **+214.55%** |
| fever | 97.46% | 16.74% | **96.91%** | **+478.91%** |
| fiqa | 75.64% | 21.09% | **76.70%** | **+263.68%** |
| hotpotqa | 90.31% | 22.72% | **78.83%** | **+247.05%** |
| msmarco | 93.14% | 46.57% | **90.97%** | **+95.34%** |
| nfcorpus | 32.22% | 49.11% | **57.16%** | **+16.39%** |
| nq | 96.59% | 29.88% | **91.42%** | **+205.96%** |
| quora | 99.45% | 69.38% | **94.86%** | **+36.72%** |
| scidocs | 44.07% | 32.62% | **53.43%** | **+63.80%** |
| scifact | 96.00% | 89.82% | **99.33%** | **+10.59%** |
| touche2020 | 52.60% | 13.91% | **69.63%** | **+400.58%** |
| trec-covid | 16.02% | 3.85% | **29.57%** | **+668.05%** |
| **avg** | **72.64%** | **34.00%** | **74.55%** | **+119.26%** |
> ⚠ Again, climate-fever's +7,305% is driven by a near-zero baseline (0.93%) β€” GTE SMVE simply doesn't work on this dataset.
## NanoBEIR
### NDCG@10 β€” ranking quality is robust to bf16
End-to-end ranking quality reported as NDCG@10, using **exact MaxSim** scoring (no approximation). GTE-ModernColBERT-v1 drops ~6 NDCG points on average going from fp32 β†’ bf16 β€” about a 9% relative drop β€” with some datasets (ArguAna, ClimateFEVER, FiQA, Touche2020) losing 8–13 points. Iso-ModernColBERT keeps fp32-level ranking quality in bf16 β€” average is within 0.6 points of fp32, and most per-dataset gaps close to a few percent.
| dataset | GTE fp32 N@10 | GTE bf16 N@10 | **Iso bf16 N@10** | **Ξ” vs GTE bf16** |
|----------------|--------------:|--------------:|------------------:|------------------:|
| ArguAna | 51.98% | 43.50% | **54.31%** | **+24.85%** |
| ClimateFEVER | 40.46% | 27.78% | **38.17%** | **+37.40%** |
| DBPedia | 72.82% | 70.39% | **71.56%** | **+1.66%** |
| FEVER | 94.52% | 89.82% | **93.23%** | **+3.80%** |
| FiQA2018 | 56.64% | 44.13% | **55.79%** | **+26.42%** |
| HotpotQA | 89.95% | 85.64% | **90.47%** | **+5.64%** |
| MSMARCO | 70.89% | 68.77% | **72.56%** | **+5.51%** |
| NFCorpus | 39.58% | 39.20% | **38.67%** | **-1.35%** |
| NQ | 77.19% | 69.01% | **73.64%** | **+6.71%** |
| QuoraRetrieval | 97.08% | 90.60% | **96.53%** | **+6.54%** |
| SCIDOCS | 39.85% | 38.02% | **38.14%** | **+0.32%** |
| SciFact | 82.98% | 80.45% | **83.32%** | **+3.57%** |
| Touche2020 | 59.34% | 48.67% | **58.77%** | **+20.75%** |
| **avg** | **67.18%** | **61.23%** | **66.55%** | **+8.69%** |
### Recall@100 β€” SMVE as a first stage with ~10Γ— overfetch
The following results show model performance when used with [Sparse Multi-Vector Encoder (SMVE)](https://www.topk.io/blog/20260311-smve-multi-vector-retrieval) as a first stage retriever.
For a SMVE first stage to be usable, it needs to surface the candidates that the exact fp32 MaxSim model would have ranked at the top. SMVE on GTE-ModernColBERT-v1 is broken β€” its compacted latent geometry means random anchors don't separate vectors well. Iso-ModernColBERT's SMVE recovers (and often exceeds) fp32 MaxSim's top-10 within 10Γ— overfetch.
| dataset | GTE fp32 MaxSim R@10 | GTE fp32 SMVE R@100 | **Iso bf16 SMVE R@100** | **Ξ” vs GTE fp32 SMVE** |
|----------------|---------------------:|--------------------:|------------------------:|-----------------------:|
| ArguAna | 80.00% | 32.00% | **90.00%** | **+181.25%** |
| ClimateFEVER | 47.07% | 20.67% | **66.97%** | **+224.00%** |
| DBPedia | 41.21% | 49.00% | **72.85%** | **+48.67%** |
| FEVER | 98.00% | 61.33% | **98.00%** | **+59.79%** |
| FiQA2018 | 64.12% | 23.25% | **78.93%** | **+239.48%** |
| HotpotQA | 92.00% | 46.00% | **90.00%** | **+95.65%** |
| MSMARCO | 92.00% | 84.00% | **98.00%** | **+16.67%** |
| NFCorpus | 15.66% | 16.33% | **24.58%** | **+50.52%** |
| NQ | 88.00% | 70.00% | **95.00%** | **+35.71%** |
| QuoraRetrieval | 98.93% | 87.93% | **96.60%** | **+9.86%** |
| SCIDOCS | 39.67% | 37.87% | **61.17%** | **+61.53%** |
| SciFact | 93.00% | 57.50% | **92.00%** | **+60.00%** |
| Touche2020 | 33.52% | 33.55% | **69.86%** | **+108.23%** |
| **avg** | **67.94%** | **47.65%** | **79.53%** | **+66.91%** |
### Recall@1000 β€” SMVE as a first stage with ~10Γ— overfetch (deeper pool)
Same picture at the next pool depth: Iso-ModernColBERT SMVE R@1000 essentially matches or exceeds fp32 MaxSim R@100 across the board, while GTE's SMVE consistently undershoots.
| dataset | GTE fp32 MaxSim R@100 | GTE fp32 SMVE R@1000 | **Iso bf16 SMVE R@1000** | **Ξ” vs GTE fp32 SMVE** |
|----------------|----------------------:|---------------------:|-------------------------:|-----------------------:|
| ArguAna | 96.00% | 80.00% | **100.00%** | **+25.00%** |
| ClimateFEVER | 81.17% | 68.80% | **89.03%** | **+29.40%** |
| DBPedia | 85.58% | 84.85% | **96.20%** | **+13.38%** |
| FEVER | 100.00% | 94.33% | **99.00%** | **+4.95%** |
| FiQA2018 | 86.82% | 72.61% | **91.35%** | **+25.81%** |
| HotpotQA | 97.00% | 84.00% | **98.00%** | **+16.67%** |
| MSMARCO | 100.00% | 98.00% | **100.00%** | **+2.04%** |
| NFCorpus | 30.55% | 52.82% | **59.33%** | **+12.32%** |
| NQ | 100.00% | 91.00% | **100.00%** | **+9.89%** |
| QuoraRetrieval | 100.00% | 96.00% | **100.00%** | **+4.17%** |
| SCIDOCS | 70.67% | 78.93% | **90.80%** | **+15.04%** |
| SciFact | 96.00% | 93.00% | **100.00%** | **+7.53%** |
| Touche2020 | 77.23% | 80.46% | **93.09%** | **+15.70%** |
| **avg** | **86.23%** | **82.68%** | **93.60%** | **+13.21%** |