Add missing space
#1
by tomaarsen HF Staff - opened
README.md
CHANGED
|
@@ -32,7 +32,7 @@ license: apache-2.0
|
|
| 32 |
<h3 align="center">State-of-the-Art ColBERT Model for Agentic Retrieval</h3>
|
| 33 |
|
| 34 |
# Tl;Dr
|
| 35 |
-
A few weeks ago, we evaluated[Reason-ModernColBERT](https://huggingface.co/lightonai/Reason-ModernColBERT), a 150M late-interaction model trained on ReasonIR data nearly solved [BrowseComp-Plus](https://huggingface.co/spaces/Tevatron/BrowseComp-Plus), reaching **87.56% accuracy** with GPT-5 (a **+7.59** absolute jump over the previous SOTA) while topping recall and calibration error, while not being trained **for agentic retrieval at all** (and being one year old).
|
| 36 |
We now present **Agent-ModernColBERT**, a model specifically fine-tuned for agentic retrieval using the [AgentIR dataset](https://huggingface.co/datasets/Tevatron/AgentIR-data) released alongside [AgentIR](https://arxiv.org/abs/2603.04384). You can find the training boilerplate [here](https://github.com/lightonai/pylate/blob/main/examples/train/agent_modern_colbert.py
|
| 37 |
). This very lightweight fine-tuning adds increase the performance of Reason-ModernColBERT by another 10%, which allows, when exposing the get_document function and the GPT-OSS-120B model, to beat the original GPT-5 + Qwen3-8B runs, while using a retriever model 54× smaller and an open source LLM.
|
| 38 |
|
|
|
|
| 32 |
<h3 align="center">State-of-the-Art ColBERT Model for Agentic Retrieval</h3>
|
| 33 |
|
| 34 |
# Tl;Dr
|
| 35 |
+
A few weeks ago, we evaluated [Reason-ModernColBERT](https://huggingface.co/lightonai/Reason-ModernColBERT), a 150M late-interaction model trained on ReasonIR data nearly solved [BrowseComp-Plus](https://huggingface.co/spaces/Tevatron/BrowseComp-Plus), reaching **87.56% accuracy** with GPT-5 (a **+7.59** absolute jump over the previous SOTA) while topping recall and calibration error, while not being trained **for agentic retrieval at all** (and being one year old).
|
| 36 |
We now present **Agent-ModernColBERT**, a model specifically fine-tuned for agentic retrieval using the [AgentIR dataset](https://huggingface.co/datasets/Tevatron/AgentIR-data) released alongside [AgentIR](https://arxiv.org/abs/2603.04384). You can find the training boilerplate [here](https://github.com/lightonai/pylate/blob/main/examples/train/agent_modern_colbert.py
|
| 37 |
). This very lightweight fine-tuning adds increase the performance of Reason-ModernColBERT by another 10%, which allows, when exposing the get_document function and the GPT-OSS-120B model, to beat the original GPT-5 + Qwen3-8B runs, while using a retriever model 54× smaller and an open source LLM.
|
| 38 |
|