Sentence Similarity
sentence-transformers
Safetensors
English
modernbert
SMVE
ColBERT
PyLate
feature-extraction
text-embeddings-inference
Instructions to use topk-io/Iso-ModernColBERT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use topk-io/Iso-ModernColBERT with sentence-transformers:
from pylate import models queries = [ "Which planet is known as the Red Planet?", "What is the largest planet in our solar system?", ] documents = [ ["Mars is the Red Planet.", "Venus is Earth's twin."], ["Jupiter is the largest planet.", "Saturn has rings."], ] model = models.ColBERT(model_name_or_path="topk-io/Iso-ModernColBERT") queries_emb = model.encode(queries, is_query=True) docs_emb = model.encode(documents, is_query=False) - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| base_model: | |
| - lightonai/GTE-ModernColBERT-v1 | |
| pipeline_tag: sentence-similarity | |
| tags: | |
| - SMVE | |
| - ColBERT | |
| - PyLate | |
| - sentence-transformers | |
| - sentence-similarity | |
| - feature-extraction | |
| datasets: | |
| - lightonai/ms-marco-en-bge-gemma | |
| language: | |
| - en | |
| <p align="center"> | |
| <svg width="300" height="84" viewBox="0 0 2000 560" fill="none" xmlns="http://www.w3.org/2000/svg"> | |
| <rect width="100" height="100" fill="#EDEDED"/> | |
| <rect x="115" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="230" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="345" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="460" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="230" y="115" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="230" y="230" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="230" y="345" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="230" y="460" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="1075" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="1075" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="1190" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="1305" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="1190" y="230" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="1305" y="230" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="1420" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="1420" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="1075" y="460" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="1075" y="460" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="1075" y="115" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="1075" y="230" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="1420" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="1420" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="1420" y="115" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="1420" y="230" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="1075" y="345" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="1075" y="460" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="710" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="825" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="940" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="595" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="595" y="115" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="595" y="230" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="595" y="345" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="595" y="460" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="940" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="710" y="460" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="825" y="460" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="940" y="460" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="595" y="460" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="940" y="460" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="940" y="115" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="940" y="230" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="940" y="345" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="940" y="460" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="940" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="940" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="940" y="460" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="940" y="460" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="940" y="115" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="940" y="230" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="940" y="345" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="940" y="460" width="100" height="100" fill="#EDEDED"/> | |
| <rect x="1555" width="100" height="100" fill="#FE5000"/> | |
| <rect x="1555" y="115" width="100" height="100" fill="#FE5000"/> | |
| <rect x="1555" y="230" width="100" height="100" fill="#FE5000"/> | |
| <rect x="1785" y="115" width="100" height="100" fill="#FE5000"/> | |
| <rect x="1670" y="230" width="100" height="100" fill="#FE5000"/> | |
| <rect x="1900" width="100" height="100" fill="#FE5000"/> | |
| <rect x="1900" width="100" height="100" fill="#FE5000"/> | |
| <rect x="1785" y="345" width="100" height="100" fill="#FE5000"/> | |
| <rect x="1900" y="460" width="100" height="100" fill="#FE5000"/> | |
| <rect x="1555" y="345" width="100" height="100" fill="#FE5000"/> | |
| <rect x="1555" y="460" width="100" height="100" fill="#FE5000"/> | |
| <rect x="1900" y="460" width="100" height="100" fill="#FE5000"/> | |
| </svg> | |
| </p> | |
| <p align="center"> | |
| <sup>Looking for production ready multi-vector search? Check out <a href="https://topk.io">TopK</a>, hybrid retrieval engine build on object storage.</sup> | |
| </p> | |
| # Iso-ModernColBERT | |
| This model is an isotropically corrected version of [GTE-ModernColBERT-v1](https://huggingface.co/lightonai/GTE-ModernColBERT-v1). | |
| It's built for production use cases where retrieval speed and quality matter. Compared to the original model, this version delivers | |
| up to 3x faster inference in `bf16` with almost no loss in accuracy and enables scalable multi-vector retrieval through | |
| [Sparse Multi-Vector Encoding (SMVE)](https://www.topk.io/blog/20260311-smve-multi-vector-retrieval) inside [TopK](https://topk.io). | |
| ## Usage | |
| Install PyLate for embeddings and TopK SDK for retrieval. | |
| ``` | |
| pip install -U pylate topk-sdk | |
| ``` | |
| ### Embed documents | |
| First, load the model into PyLate `ColBERT` class and encode your documents. | |
| ```python | |
| import torch | |
| import numpy as np | |
| from pylate import models | |
| model = models.ColBERT( | |
| model_name_or_path="topk-io/Iso-ModernColBERT", | |
| model_kwargs={'torch_dtype': torch.bfloat16}, | |
| ) | |
| documents = [ | |
| "document 1 text", | |
| "document 2 text", | |
| "document 3 text", | |
| ] | |
| doc_embeddings = model.encode( | |
| documents, | |
| batch_size=32, | |
| # Ensure that it is set to False to indicate that these are documents, not queries | |
| is_query=False, | |
| show_progress_bar=True, | |
| ) | |
| ``` | |
| ### Store document embeddings | |
| Index multi-vector document embeddings inside [TopK](https://topk.io), hybrid retrieval engine built on object storage. | |
| To get started, [create an API key](https://console.topk.io). | |
| ```python | |
| from topk_sdk import Client | |
| from topk_sdk.schema import matrix, multi_vector_index | |
| # Initialize TopK client | |
| client = Client( | |
| api_key = "<TOPK_API_KEY>", | |
| region = "aws-us-east-1-elastica", | |
| ) | |
| # Create a collection with multi-vector index | |
| client.collections().create( | |
| "iso-moderncolbert", | |
| schema = { | |
| "token_embeddings": matrix(dimension=128, value_type="f16") | |
| .index(multi_vector_index(metric="maxsim")) | |
| } | |
| ) | |
| # Upsert document embeddings | |
| client.collection("iso-moderncolbert").upsert([ | |
| { | |
| "_id": str(i), | |
| "token_embeddings": emb.astype(np.float16), | |
| "text": text | |
| } | |
| for (i, (text, emb)) in enumerate(zip(documents, doc_embeddings)) | |
| ]) | |
| ``` | |
| ### Retrieve documents for queries | |
| Your documents are now durably persisted in the index and queryable. | |
| ```python | |
| from topk_sdk.query import fn, select, field | |
| # Encode query string | |
| query_embedding = model.encode( | |
| "query for document 3", | |
| # Ensure that it is set to True for queries | |
| is_query=True, | |
| show_progress_bar=False, | |
| ) | |
| # Retrieve top-k documents using the query embedding | |
| results = client.collection("iso-moderncolbert").query( | |
| select( | |
| "_id", "text", | |
| # Compute maxsim between query and indexed documents | |
| maxsim_score = fn.multi_vector_distance( | |
| "token_embeddings", | |
| query_embedding.astype(np.float16) | |
| ) | |
| ) | |
| # Get the top 10 matching documents | |
| .topk(field("maxsim_score"), 10) | |
| ) | |
| for r in results: | |
| print(f"id: {r['_id']}, score: {r['maxsim_score']}, text: {r['text']}") | |
| ``` | |
| TopK's query language is flexible and allows you to tune retrieval parameters, combine multi-vector with metadata filters, | |
| keyword search, and more. Check out our [docs](https://docs.topk.io) to learn more. | |
| # Evaluation results | |
| We conducted evaluation of our model using an internal evaluation harness on two standard benchmarks - BEIR and NanoBEIR. | |
| For baselines, we selected [GTE-ModernColBERT-v1](https://huggingface.co/lightonai/GTE-ModernColBERT-v1) and evaluated its perfomance in fp32 and bf16 precision (denoted by `GTE fp32` and `GTE bf16`, respectively). | |
| The last two columns of each table β **Iso bf16** and **Ξ vs GTE** β describe Iso-ModernColBERT (ours) in bf16 precision. | |
| In all configurations we used the same SMVE implementation with width 65536 and k=32. | |
| ## BEIR | |
| ### NDCG@10 β ranking quality is robust to bf16 | |
| End-to-end ranking quality reported as NDCG@10, using **exact MaxSim** scoring (no approximation). GTE-ModernColBERT-v1 loses ~7 NDCG points on average going from fp32 β bf16 β about a 13% relative drop β with the worst-hit datasets (trec-covid, climate-fever, hotpotqa) dropping 12β16 points. Iso-ModernColBERT keeps fp32-level ranking quality in bf16, recovering most of that gap on average and on every dataset. | |
| | dataset | GTE fp32 N@10 | GTE bf16 N@10 | **Iso bf16 N@10** | **Ξ vs GTE bf16** | | |
| |---------------|--------------:|--------------:|------------------:|------------------:| | |
| | arguana | 35.81% | 30.35% | **34.63%** | **+14.10%** | | |
| | climate-fever | 32.44% | 19.49% | **31.62%** | **+62.24%** | | |
| | cqadupstack | 40.54% | 38.25% | **40.64%** | **+6.25%** | | |
| | dbpedia | 53.96% | 48.43% | **52.84%** | **+9.11%** | | |
| | fever | 88.80% | 80.67% | **87.08%** | **+7.95%** | | |
| | fiqa | 45.56% | 37.15% | **43.48%** | **+17.04%** | | |
| | hotpotqa | 78.36% | 66.74% | **75.85%** | **+13.65%** | | |
| | msmarco | 46.12% | 41.82% | **45.30%** | **+8.32%** | | |
| | nfcorpus | 37.81% | 35.98% | **37.31%** | **+3.70%** | | |
| | nq | 62.24% | 52.60% | **60.45%** | **+14.92%** | | |
| | quora | 86.63% | 79.58% | **85.05%** | **+6.87%** | | |
| | scidocs | 19.49% | 17.82% | **18.81%** | **+5.56%** | | |
| | scifact | 75.98% | 71.55% | **75.26%** | **+5.18%** | | |
| | touche2020 | 31.30% | 22.93% | **29.45%** | **+28.43%** | | |
| | trec-covid | 89.30% | 73.47% | **83.76%** | **+14.01%** | | |
| | **avg** | **54.96%** | **47.79%** | **53.44%** | **+11.82%** | | |
| ### Recall@100 β SMVE as a first stage with ~10Γ overfetch | |
| The following results show model performance when used with [Sparse Multi-Vector Encoder (SMVE)](https://www.topk.io/blog/20260311-smve-multi-vector-retrieval) as a first stage retriever. | |
| For a SMVE first stage to be usable, it needs to surface the candidates that the exact fp32 MaxSim model would have ranked at the top. SMVE on GTE-ModernColBERT-v1 is broken β its compacted latent geometry means random anchors don't separate vectors well. Iso-ModernColBERT's SMVE recovers (and often exceeds) the fp32 MaxSim top-10 within 10Γ overfetch. | |
| | dataset | GTE fp32 MaxSim R@10 | GTE fp32 SMVE R@100 | **Iso bf16 SMVE R@100** | **Ξ vs GTE fp32 SMVE** | | |
| |---------------|---------------------:|--------------------:|------------------------:|-----------------------:| | |
| | arguana | 72.81% | 27.69% | **84.51%** | **+205.20%** | | |
| | climate-fever | 39.27% | 0.41% | **48.84%** | **+11,812%** β | | |
| | cqadupstack | 50.48% | 11.78% | **37.29%** | **+216.55%** | | |
| | dbpedia | 30.45% | 8.54% | **36.89%** | **+331.97%** | | |
| | fever | 94.20% | 10.05% | **94.31%** | **+838.41%** | | |
| | fiqa | 52.15% | 6.45% | **49.12%** | **+661.55%** | | |
| | hotpotqa | 80.73% | 12.29% | **66.59%** | **+441.82%** | | |
| | msmarco | 68.64% | 27.77% | **75.83%** | **+173.07%** | | |
| | nfcorpus | 18.03% | 16.63% | **25.60%** | **+53.94%** | | |
| | nq | 82.03% | 14.60% | **78.85%** | **+440.07%** | | |
| | quora | 94.92% | 43.73% | **82.86%** | **+89.48%** | | |
| | scidocs | 20.36% | 12.29% | **29.32%** | **+138.57%** | | |
| | scifact | 87.39% | 60.93% | **90.00%** | **+47.71%** | | |
| | touche2020 | 19.69% | 4.47% | **40.17%** | **+798.66%** | | |
| | trec-covid | 2.27% | 0.89% | **7.73%** | **+768.54%** | | |
| | **avg** | **54.23%** | **17.23%** | **56.53%** | **+228.09%** | | |
| > β The +11,812% on climate-fever is an artifact of a near-zero baseline (0.41%): GTE's SMVE is so broken on that dataset that the ratio explodes. Read it as *"GTE SMVE doesn't work here at all"*, not as a meaningful magnitude. | |
| ### Recall@1000 β SMVE as a first stage with ~10Γ overfetch (deeper pool) | |
| Same picture at the next pool depth: Iso-ModernColBERT SMVE R@1000 essentially matches or exceeds fp32 MaxSim R@100 across the board, while GTE's SMVE collapses. | |
| | dataset | GTE fp32 MaxSim R@100 | GTE fp32 SMVE R@1000 | **Iso bf16 SMVE R@1000** | **Ξ vs GTE fp32 SMVE** | | |
| |---------------|----------------------:|---------------------:|-------------------------:|-----------------------:| | |
| | arguana | 95.72% | 68.31% | **97.00%** | **+42.00%** | | |
| | climate-fever | 66.45% | 0.93% | **68.87%** | **+7,305%** β | | |
| | cqadupstack | 71.44% | 26.78% | **55.78%** | **+108.29%** | | |
| | dbpedia | 62.50% | 18.35% | **57.72%** | **+214.55%** | | |
| | fever | 97.46% | 16.74% | **96.91%** | **+478.91%** | | |
| | fiqa | 75.64% | 21.09% | **76.70%** | **+263.68%** | | |
| | hotpotqa | 90.31% | 22.72% | **78.83%** | **+247.05%** | | |
| | msmarco | 93.14% | 46.57% | **90.97%** | **+95.34%** | | |
| | nfcorpus | 32.22% | 49.11% | **57.16%** | **+16.39%** | | |
| | nq | 96.59% | 29.88% | **91.42%** | **+205.96%** | | |
| | quora | 99.45% | 69.38% | **94.86%** | **+36.72%** | | |
| | scidocs | 44.07% | 32.62% | **53.43%** | **+63.80%** | | |
| | scifact | 96.00% | 89.82% | **99.33%** | **+10.59%** | | |
| | touche2020 | 52.60% | 13.91% | **69.63%** | **+400.58%** | | |
| | trec-covid | 16.02% | 3.85% | **29.57%** | **+668.05%** | | |
| | **avg** | **72.64%** | **34.00%** | **74.55%** | **+119.26%** | | |
| > β Again, climate-fever's +7,305% is driven by a near-zero baseline (0.93%) β GTE SMVE simply doesn't work on this dataset. | |
| ## NanoBEIR | |
| ### NDCG@10 β ranking quality is robust to bf16 | |
| End-to-end ranking quality reported as NDCG@10, using **exact MaxSim** scoring (no approximation). GTE-ModernColBERT-v1 drops ~6 NDCG points on average going from fp32 β bf16 β about a 9% relative drop β with some datasets (ArguAna, ClimateFEVER, FiQA, Touche2020) losing 8β13 points. Iso-ModernColBERT keeps fp32-level ranking quality in bf16 β average is within 0.6 points of fp32, and most per-dataset gaps close to a few percent. | |
| | dataset | GTE fp32 N@10 | GTE bf16 N@10 | **Iso bf16 N@10** | **Ξ vs GTE bf16** | | |
| |----------------|--------------:|--------------:|------------------:|------------------:| | |
| | ArguAna | 51.98% | 43.50% | **54.31%** | **+24.85%** | | |
| | ClimateFEVER | 40.46% | 27.78% | **38.17%** | **+37.40%** | | |
| | DBPedia | 72.82% | 70.39% | **71.56%** | **+1.66%** | | |
| | FEVER | 94.52% | 89.82% | **93.23%** | **+3.80%** | | |
| | FiQA2018 | 56.64% | 44.13% | **55.79%** | **+26.42%** | | |
| | HotpotQA | 89.95% | 85.64% | **90.47%** | **+5.64%** | | |
| | MSMARCO | 70.89% | 68.77% | **72.56%** | **+5.51%** | | |
| | NFCorpus | 39.58% | 39.20% | **38.67%** | **-1.35%** | | |
| | NQ | 77.19% | 69.01% | **73.64%** | **+6.71%** | | |
| | QuoraRetrieval | 97.08% | 90.60% | **96.53%** | **+6.54%** | | |
| | SCIDOCS | 39.85% | 38.02% | **38.14%** | **+0.32%** | | |
| | SciFact | 82.98% | 80.45% | **83.32%** | **+3.57%** | | |
| | Touche2020 | 59.34% | 48.67% | **58.77%** | **+20.75%** | | |
| | **avg** | **67.18%** | **61.23%** | **66.55%** | **+8.69%** | | |
| ### Recall@100 β SMVE as a first stage with ~10Γ overfetch | |
| The following results show model performance when used with [Sparse Multi-Vector Encoder (SMVE)](https://www.topk.io/blog/20260311-smve-multi-vector-retrieval) as a first stage retriever. | |
| For a SMVE first stage to be usable, it needs to surface the candidates that the exact fp32 MaxSim model would have ranked at the top. SMVE on GTE-ModernColBERT-v1 is broken β its compacted latent geometry means random anchors don't separate vectors well. Iso-ModernColBERT's SMVE recovers (and often exceeds) fp32 MaxSim's top-10 within 10Γ overfetch. | |
| | dataset | GTE fp32 MaxSim R@10 | GTE fp32 SMVE R@100 | **Iso bf16 SMVE R@100** | **Ξ vs GTE fp32 SMVE** | | |
| |----------------|---------------------:|--------------------:|------------------------:|-----------------------:| | |
| | ArguAna | 80.00% | 32.00% | **90.00%** | **+181.25%** | | |
| | ClimateFEVER | 47.07% | 20.67% | **66.97%** | **+224.00%** | | |
| | DBPedia | 41.21% | 49.00% | **72.85%** | **+48.67%** | | |
| | FEVER | 98.00% | 61.33% | **98.00%** | **+59.79%** | | |
| | FiQA2018 | 64.12% | 23.25% | **78.93%** | **+239.48%** | | |
| | HotpotQA | 92.00% | 46.00% | **90.00%** | **+95.65%** | | |
| | MSMARCO | 92.00% | 84.00% | **98.00%** | **+16.67%** | | |
| | NFCorpus | 15.66% | 16.33% | **24.58%** | **+50.52%** | | |
| | NQ | 88.00% | 70.00% | **95.00%** | **+35.71%** | | |
| | QuoraRetrieval | 98.93% | 87.93% | **96.60%** | **+9.86%** | | |
| | SCIDOCS | 39.67% | 37.87% | **61.17%** | **+61.53%** | | |
| | SciFact | 93.00% | 57.50% | **92.00%** | **+60.00%** | | |
| | Touche2020 | 33.52% | 33.55% | **69.86%** | **+108.23%** | | |
| | **avg** | **67.94%** | **47.65%** | **79.53%** | **+66.91%** | | |
| ### Recall@1000 β SMVE as a first stage with ~10Γ overfetch (deeper pool) | |
| Same picture at the next pool depth: Iso-ModernColBERT SMVE R@1000 essentially matches or exceeds fp32 MaxSim R@100 across the board, while GTE's SMVE consistently undershoots. | |
| | dataset | GTE fp32 MaxSim R@100 | GTE fp32 SMVE R@1000 | **Iso bf16 SMVE R@1000** | **Ξ vs GTE fp32 SMVE** | | |
| |----------------|----------------------:|---------------------:|-------------------------:|-----------------------:| | |
| | ArguAna | 96.00% | 80.00% | **100.00%** | **+25.00%** | | |
| | ClimateFEVER | 81.17% | 68.80% | **89.03%** | **+29.40%** | | |
| | DBPedia | 85.58% | 84.85% | **96.20%** | **+13.38%** | | |
| | FEVER | 100.00% | 94.33% | **99.00%** | **+4.95%** | | |
| | FiQA2018 | 86.82% | 72.61% | **91.35%** | **+25.81%** | | |
| | HotpotQA | 97.00% | 84.00% | **98.00%** | **+16.67%** | | |
| | MSMARCO | 100.00% | 98.00% | **100.00%** | **+2.04%** | | |
| | NFCorpus | 30.55% | 52.82% | **59.33%** | **+12.32%** | | |
| | NQ | 100.00% | 91.00% | **100.00%** | **+9.89%** | | |
| | QuoraRetrieval | 100.00% | 96.00% | **100.00%** | **+4.17%** | | |
| | SCIDOCS | 70.67% | 78.93% | **90.80%** | **+15.04%** | | |
| | SciFact | 96.00% | 93.00% | **100.00%** | **+7.53%** | | |
| | Touche2020 | 77.23% | 80.46% | **93.09%** | **+15.70%** | | |
| | **avg** | **86.23%** | **82.68%** | **93.60%** | **+13.21%** | |