dutta18 commited on
Commit
4e7952a
·
verified ·
1 Parent(s): 46ca325

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -0
README.md ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ library_name: colbert
5
+ pipeline_tag: sentence-similarity
6
+ tags:
7
+ - information-retrieval
8
+ - retrieval
9
+ - late-interaction
10
+ - ColBERT
11
+ license: mit # ← change if needed
12
+ base_model: colbert-ir/colbertv1.9
13
+ ---
14
+
15
+ # Colbert-Finetuned
16
+
17
+ **ColBERT** (Contextualized Late Interaction over BERT) is a retrieval model that scores queries vs. passages using fine-grained token-level interactions (“late interaction”). This repo hosts a **fine-tuned ColBERT checkpoint** for neural information retrieval.
18
+
19
+ - **Base model:** `colbert-ir/colbertv1.9`
20
+ - **Library:** [`colbert`](https://github.com/stanford-futuredata/ColBERT) (with Hugging Face backbones)
21
+ - **Intended use:** passage/document retrieval in RAG and search systems
22
+
23
+ > ℹ️ ColBERT encodes queries and passages into token-level embedding matrices and uses `MaxSim` to compute relevance at search time. It typically outperforms single-vector embedding retrievers while remaining scalable.
24
+
25
+ ---
26
+
27
+ ## ✨ What’s in this checkpoint
28
+
29
+ - Fine-tuned ColBERT weights starting from `colbert-ir/colbertv1.9`.
30
+ - Trained with **triples JSONL** (`[qid, pid+, pid-]`) using **TSV** `queries.tsv` and `collection.tsv` (IDs + text).
31
+ - Default training hyperparameters are listed below (batch size, lr, doc_maxlen, dim, etc.).
32
+
33
+ ---
34
+
35
+ ## 🔧 Quickstart
36
+
37
+ ### Option A — Use with the ColBERT library (recommended)
38
+
39
+ ```python
40
+ from colbert.infra import Run, RunConfig, ColBERTConfig
41
+ from colbert import Indexer, Searcher
42
+ from colbert.data import Queries
43
+
44
+ # 1) Index your collection (pid \t passage)
45
+ with Run().context(RunConfig(nranks=1, experiment="my-exp")):
46
+ cfg = ColBERTConfig(root="/path/to/experiments")
47
+ indexer = Indexer(checkpoint="dutta18/Colbert-Finetuned", config=cfg)
48
+ indexer.index(
49
+ name="my.index",
50
+ collection="/path/to/collection.tsv" # "pid \t passage text"
51
+ )
52
+
53
+ # 2) Search with queries (qid \t query)
54
+ with Run().context(RunConfig(nranks=1, experiment="my-exp")):
55
+ cfg = ColBERTConfig(root="/path/to/experiments")
56
+ searcher = Searcher(index="my.index", config=cfg)
57
+ queries = Queries("/path/to/queries.tsv") # "qid \t query text"
58
+ ranking = searcher.search_all(queries, k=20)
59
+ ranking.save("my.index.top20.tsv")