lightonai
/

Agent-ModernColBERT

@@ -32,7 +32,7 @@ license: apache-2.0
 <h3 align="center">State-of-the-Art ColBERT Model for Agentic Retrieval</h3>
 # Tl;Dr
-A few weeks ago, we evaluated[Reason-ModernColBERT](https://huggingface.co/lightonai/Reason-ModernColBERT), a 150M late-interaction model trained on ReasonIR data nearly solved [BrowseComp-Plus](https://huggingface.co/spaces/Tevatron/BrowseComp-Plus), reaching **87.56% accuracy** with GPT-5 (a **+7.59** absolute jump over the previous SOTA) while topping recall and calibration error, while not being trained **for agentic retrieval at all** (and being one year old).
 We now present **Agent-ModernColBERT**, a model specifically fine-tuned for agentic retrieval using the [AgentIR dataset](https://huggingface.co/datasets/Tevatron/AgentIR-data) released alongside [AgentIR](https://arxiv.org/abs/2603.04384). You can find the training boilerplate [here](https://github.com/lightonai/pylate/blob/main/examples/train/agent_modern_colbert.py
 ). This very lightweight fine-tuning adds increase the performance of Reason-ModernColBERT by another 10%, which allows, when exposing the get_document function and the GPT-OSS-120B model, to beat the original GPT-5 + Qwen3-8B runs, while using a retriever model 54× smaller and an open source LLM.

 <h3 align="center">State-of-the-Art ColBERT Model for Agentic Retrieval</h3>
 # Tl;Dr
+A few weeks ago, we evaluated [Reason-ModernColBERT](https://huggingface.co/lightonai/Reason-ModernColBERT), a 150M late-interaction model trained on ReasonIR data nearly solved [BrowseComp-Plus](https://huggingface.co/spaces/Tevatron/BrowseComp-Plus), reaching **87.56% accuracy** with GPT-5 (a **+7.59** absolute jump over the previous SOTA) while topping recall and calibration error, while not being trained **for agentic retrieval at all** (and being one year old).
 We now present **Agent-ModernColBERT**, a model specifically fine-tuned for agentic retrieval using the [AgentIR dataset](https://huggingface.co/datasets/Tevatron/AgentIR-data) released alongside [AgentIR](https://arxiv.org/abs/2603.04384). You can find the training boilerplate [here](https://github.com/lightonai/pylate/blob/main/examples/train/agent_modern_colbert.py
 ). This very lightweight fine-tuning adds increase the performance of Reason-ModernColBERT by another 10%, which allows, when exposing the get_document function and the GPT-OSS-120B model, to beat the original GPT-5 + Qwen3-8B runs, while using a retriever model 54× smaller and an open source LLM.