Spaces:

illuin-conteb
/

README

Running

manu commited on May 31, 2025

Commit

78154a6

verified ·

1 Parent(s): 39e9e8b

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ pinned: true
 <img src="https://cdn-uploads.huggingface.co/production/uploads/60f2e021adf471cbdf8bb660/jq_zYRy23bOZ9qey3VY4v.png" width="800">
-This organization contains all artifacts released with our preprint [*Context is Gold to find the Gold Passage: Evaluating and Training Contextual Document Embeddings *](https://arxiv.org/abs/XXX),
 including the [ConTEB](https://huggingface.co/collections/illuin-conteb/conteb-datasets-6839fffd25f1d3685f3ad604) benchmark.
 ### Abstract
@@ -25,29 +25,13 @@ We open-source all artifacts here and at https://github.com/illuin-tech/contextu
 ## Models
-- TODO
-## Benchmark
 - [*Leaderboard*](TODO)
--
-## Datasets
-We organized datasets into collections to constitute our benchmark ViDoRe and its derivates (OCR and Captioning). Below is a brief description of each of them.
-- [*ConTEB Benchmark*](TODO)
--
-## Code
-CHANGE
-- [*Contextual Document Engine*](https://github.com/illuin-tech/contextual-document-embeddings): The code used to train and run inference with our architecture.
-- [*ConTEB Benchmarkk*](https://github.com/illuin-tech/conteb-benchmark): A Python package/CLI tool to evaluate document retrieval systems on the ViDoRe benchmark.
-## Extra
-- [*Blog*](https://huggingface.co/XXX: TODO
 - [*Preprint*](https://huggingface.co/XXX): The paper with all details !
 ## Contact

 <img src="https://cdn-uploads.huggingface.co/production/uploads/60f2e021adf471cbdf8bb660/jq_zYRy23bOZ9qey3VY4v.png" width="800">
+This organization contains all artifacts released with our preprint [*Context is Gold to find the Gold Passage: Evaluating and Training Contextual Document Embeddings*](https://arxiv.org/abs/XXX),
 including the [ConTEB](https://huggingface.co/collections/illuin-conteb/conteb-datasets-6839fffd25f1d3685f3ad604) benchmark.
 ### Abstract
 ## Models
+- [*(Model) ModernBERT*](TODO) The Contextualized ModernBERT bi-encoder trained with InSENT loss and Late Chunking
+- [*(Model) ModernColBERT*](TODO) The Contextualized ModernColBERT trained with InSENT loss and Late Chunking
 - [*Leaderboard*](TODO)
+- [*(Data) ConTEB Benchmark Datasets*](TODO)
+- [*(Code) Contextual Document Engine*](https://github.com/illuin-tech/contextual-embeddings): The code used to train and run inference with our architecture.
+- [*(Code) ConTEB Benchmarkk*](https://github.com/illuin-tech/conteb): A Python package/CLI tool to evaluate document retrieval systems on the ConTEB benchmark.
+- [*Blog*](https://huggingface.co/XXX): TODO
 - [*Preprint*](https://huggingface.co/XXX): The paper with all details !
 ## Contact