ndizeye nzeyi commited on
Commit
dd9c97f
·
0 Parent(s):

Duplicate from C4IR-RW/kinyabert

Browse files

Co-authored-by: Antoine Nzeyi <nzeyi@users.noreply.huggingface.co>

Files changed (28) hide show
  1. .gitattributes +35 -0
  2. README.md +304 -0
  3. kinya_colbert_large_rw_ag_retrieval_finetuned_512D.pt +3 -0
  4. kinyabert_base_pretrained.pt +3 -0
  5. kinyabert_large_pretrained.pt +3 -0
  6. ragatouille-kinya-colbert/checkpoints/colbert-10000/artifact.metadata +60 -0
  7. ragatouille-kinya-colbert/checkpoints/colbert-10000/config.json +32 -0
  8. ragatouille-kinya-colbert/checkpoints/colbert-10000/model.safetensors +3 -0
  9. ragatouille-kinya-colbert/checkpoints/colbert-10000/special_tokens_map.json +7 -0
  10. ragatouille-kinya-colbert/checkpoints/colbert-10000/tokenizer.json +0 -0
  11. ragatouille-kinya-colbert/checkpoints/colbert-10000/tokenizer_config.json +58 -0
  12. ragatouille-kinya-colbert/checkpoints/colbert-10000/vocab.txt +0 -0
  13. ragatouille-kinya-colbert/indexes/agai-colbert-10000/0.codes.pt +3 -0
  14. ragatouille-kinya-colbert/indexes/agai-colbert-10000/0.metadata.json +6 -0
  15. ragatouille-kinya-colbert/indexes/agai-colbert-10000/0.residuals.pt +3 -0
  16. ragatouille-kinya-colbert/indexes/agai-colbert-10000/1.codes.pt +3 -0
  17. ragatouille-kinya-colbert/indexes/agai-colbert-10000/1.metadata.json +6 -0
  18. ragatouille-kinya-colbert/indexes/agai-colbert-10000/1.residuals.pt +3 -0
  19. ragatouille-kinya-colbert/indexes/agai-colbert-10000/avg_residual.pt +3 -0
  20. ragatouille-kinya-colbert/indexes/agai-colbert-10000/buckets.pt +3 -0
  21. ragatouille-kinya-colbert/indexes/agai-colbert-10000/centroids.pt +3 -0
  22. ragatouille-kinya-colbert/indexes/agai-colbert-10000/collection.json +0 -0
  23. ragatouille-kinya-colbert/indexes/agai-colbert-10000/doclens.0.json +1 -0
  24. ragatouille-kinya-colbert/indexes/agai-colbert-10000/doclens.1.json +1 -0
  25. ragatouille-kinya-colbert/indexes/agai-colbert-10000/ivf.pid.pt +3 -0
  26. ragatouille-kinya-colbert/indexes/agai-colbert-10000/metadata.json +73 -0
  27. ragatouille-kinya-colbert/indexes/agai-colbert-10000/pid_docid_map.json +986 -0
  28. ragatouille-kinya-colbert/indexes/agai-colbert-10000/plan.json +67 -0
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,304 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - C4IR-RW/kinya-ag-retrieval
4
+ language:
5
+ - rw
6
+ metrics:
7
+ - accuracy
8
+ tags:
9
+ - kinyarwanda
10
+ - kinyabert
11
+ - bert
12
+ - colbert
13
+ - rag
14
+ - retrieval
15
+ license: cc-by-4.0
16
+ ---
17
+
18
+ # Kinyarwanda BERT and ColBERT models
19
+
20
+ In Rwanda, many farmers struggle to access timely, personalized agricultural information. Traditional channels - like radio, TV, and online sources - offer limited reach and interactivity, while extension services and a national call center, staffed by only two agents for over two million farmers, face capacity constraints. To address these gaps, we developed a 24/7 AI-enabled Interactive Voice Response (IVR) tool. Accessible via a Kinyarwanda-speaking hotline, this tool provides advisory on topics such as pest and disease diagnosis and agro-climatic practices, as well as information on MINAGRI’s support programs for farmers, e.g. crop insurances. By utilizing AI and IVR technology, this project will make agricultural advisories more accessible, timely, and responsive to farmers’ needs. For more information, please reach out to [C4IR](https://c4ir.rw/).
21
+
22
+ Implemented by: [C4IR Rwanda](https://c4ir.rw/) & [KiNLP](https://kinlp.com/); Supported by [GIZ](https://www.giz.de/); Financed by: [BMZ](https://www.bmz.de/en).
23
+
24
+ ## Introduction
25
+
26
+ This repository Pre-trained foundational models for Kinyarwanda passage retrieval/ranking. Running these models requires using [DeepKIN-AgAI](https://github.com/c4ir-rw/ac-ai-models/tree/main/DeepKIN-AgAI) package.
27
+
28
+ ## Example uses:
29
+
30
+ ### 1. Fine-tuning a pretrained KinyaBERT model into a KinyaColBERT retrieval model
31
+
32
+ The following example uses a pre-trained KinyaBERT base model (107M paremeters).
33
+
34
+ The training data for agricultural retrieval (i.e. ["C4IR-RW/kinya-ag-retrieval"](https://huggingface.co/datasets/C4IR-RW/kinya-ag-retrieval) on Hugging Face) has been morphologically parsed already, but for other datasets, [MorphoKIN](https://github.com/anzeyimana/morphokin) parsing will be performed first.
35
+
36
+ ```shell
37
+
38
+ # 1. Copy "kinya-ag-retrieval" dataset from Hugging face into a local directory, e.g. /home/ubuntu/DATA/kinya-ag-retrieval/
39
+
40
+ # 2. Copy "kinyabert_base_pretrained.pt" model into a local directory, e.g. /home/ubuntu/DATA/kinyabert_base_pretrained.pt
41
+
42
+ # 3. Run the following training script from DeepKIN-AgAI package:
43
+
44
+ python3 DeepKIN-AgAI/deepkin/train/flex_trainer.py \
45
+ --model_variant="kinya_colbert:base" \
46
+ --colbert_embedding_dim=512 \
47
+ --gpus=1 \
48
+ --batch_size=12 \
49
+ --accumulation_steps=10 \
50
+ --dataloader_num_workers=4 \
51
+ --dataloader_persistent_workers=True \
52
+ --dataloader_pin_memory=True \
53
+ --use_ddp=False \
54
+ --use_mtl_optimizer=False \
55
+ --warmup_iter=2000 \
56
+ --peak_lr=1e-5 \
57
+ --lr_decay_style="cosine" \
58
+ --num_iters=152630 \
59
+ --dataset_max_seq_len=512 \
60
+ --use_iterable_dataset=False \
61
+ --train_log_steps=1 \
62
+ --checkpoint_steps=1000 \
63
+ --pretrained_bert_model_file="/home/ubuntu/DATA/kinyabert_base_pretrained.pt" \
64
+ --qa_train_query_id="/home/ubuntu/DATA/kinya-ag-retrieval/rw_ag_retrieval_query_id.txt" \
65
+ --qa_train_query_text="/home/ubuntu/DATA/kinya-ag-retrieval/parsed_rw_ag_retrieval_query_text.txt" \
66
+ --qa_train_passage_id="/home/ubuntu/DATA/kinya-ag-retrieval/rw_ag_retrieval_passage_id.txt" \
67
+ --qa_train_passage_text="/home/ubuntu/DATA/kinya-ag-retrieval/parsed_rw_ag_retrieval_passage_text.txt" \
68
+ --qa_train_qpn_triples="/home/ubuntu/DATA/kinya-ag-retrieval/rw_ag_retrieval_qpntriplets_all.tsv" \
69
+ --qa_dev_query_id="/home/ubuntu/DATA/kinya-ag-retrieval/rw_ag_retrieval_query_id.txt" \
70
+ --qa_dev_query_text="/home/ubuntu/DATA/kinya-ag-retrieval/parsed_rw_ag_retrieval_query_text.txt" \
71
+ --qa_dev_passage_id="/home/ubuntu/DATA/kinya-ag-retrieval/rw_ag_retrieval_passage_id.txt" \
72
+ --qa_dev_passage_text="/home/ubuntu/DATA/kinya-ag-retrieval/parsed_rw_ag_retrieval_passage_text.txt" \
73
+ --qa_dev_qpn_triples="/home/ubuntu/DATA/kinya-ag-retrieval/rw_ag_retrieval_qpntriplets_dev.tsv" \
74
+ --load_saved_model=True \
75
+ --model_save_path="/home/ubuntu/DATA/kinya_colbert_base_rw_ag_retrieval_new.pt"
76
+
77
+
78
+ ```
79
+
80
+ ### 2. Running an API server for KinyaColBERT agricultural retrieval
81
+
82
+ 1. First, run [MorphoKIN](https://github.com/anzeyimana/morphokin) server on Unix domain socket:
83
+
84
+ ```shell
85
+
86
+ # Launch a daemon container
87
+
88
+ docker run -d -v /home/ubuntu/MORPHODATA:/MORPHODATA \
89
+ --gpus all morphokin:latest morphokin \
90
+ --morphokin_working_dir /MORPHODATA \
91
+ --morphokin_config_file /MORPHODATA/data/analysis_config_file.conf \
92
+ --task RMS \
93
+ --kinlp_license /MORPHODATA/licenses/KINLP_LICENSE_FILE.dat \
94
+ --ca_roots_pem_file /MORPHODATA/data/roots.pem \
95
+ --morpho_socket /MORPHODATA/run/morpho.sock
96
+
97
+
98
+ ```
99
+
100
+ 2. Wait for MorphoKIN socket server to be ready by monitoring the container logs.
101
+
102
+ ```shell
103
+
104
+ docker container ls
105
+
106
+ docker logs -f <CONTAINER ID>
107
+
108
+ # MorphoKIN server is ready once you see a message like this: MorphoKin server listening on UNIX SOCKET: /MORPHODATA/run/morpho.sock
109
+
110
+ ```
111
+
112
+ 3. Then, run the retrieval API server:
113
+
114
+ ```shell
115
+
116
+ mkdir -p /home/ubuntu/DATA/agai_index
117
+
118
+ python3 DeepKIN-AgAI/deepkin/production/agai_backend.py
119
+
120
+ ```
121
+
122
+ ### 3. Evaluating KinyaColBERT pre-trained model on ["C4IR-RW/kinya-ag-retrieval"](https://huggingface.co/datasets/C4IR-RW/kinya-ag-retrieval)
123
+
124
+
125
+ ```python
126
+ import progressbar
127
+ import torch
128
+ import torch.nn.functional as F
129
+
130
+ from deepkin.clib.libkinlp.kinlpy import ParsedFlexSentence
131
+ from deepkin.data.morpho_qa_triple_data import DOCUMENT_TYPE_ID, QUESTION_TYPE_ID
132
+ from deepkin.models.kinyabert import KinyaColBERT
133
+ from deepkin.utils.misc_functions import read_lines
134
+
135
+ DATA_DIR = '/home/ubuntu/DATA'
136
+ rank = 0
137
+ pretrained_model_file = f'{DATA_DIR}/kinya_colbert_large_rw_ag_retrieval_finetuned_512D.pt'
138
+ keyword = f'kinya_colbert_large'
139
+
140
+ qa_query_id = f'{DATA_DIR}/kinya-ag-retrieval/rw_ag_retrieval_query_id.txt'
141
+ qa_query_text = f'{DATA_DIR}/kinya-ag-retrieval/parsed_rw_ag_retrieval_query_text.txt'
142
+ qa_passage_id = f'{DATA_DIR}/kinya-ag-retrieval/rw_ag_retrieval_passage_id.txt'
143
+ qa_passage_text = f'{DATA_DIR}/kinya-ag-retrieval/parsed_rw_ag_retrieval_passage_text.txt'
144
+
145
+ all_queries = {idx: ParsedFlexSentence(txt) for idx, txt in zip(read_lines(qa_query_id), read_lines(qa_query_text))}
146
+ all_passages = {idx: ParsedFlexSentence(txt) for idx, txt in zip(read_lines(qa_passage_id), read_lines(qa_passage_text))}
147
+
148
+ print(f'Got: {len(all_queries)} queries, {len(all_passages)} passages', flush=True)
149
+
150
+ device = torch.device('cuda:%d' % rank)
151
+
152
+ model, args = KinyaColBERT.from_pretrained(device, pretrained_model_file, ret_args=True)
153
+ model.float()
154
+ model.eval()
155
+
156
+ passage_embeddings = dict()
157
+ DocPool = None
158
+ QueryPool = None
159
+ with torch.no_grad():
160
+ print(f'{keyword} Embedding passages ...', flush=True)
161
+ with progressbar.ProgressBar(max_value=len(all_passages), redirect_stdout=True) as bar:
162
+ for itr, (passage_id, passage) in enumerate(all_passages.items()):
163
+ if (itr % 100) == 0:
164
+ bar.update(itr)
165
+ passage.trim(508)
166
+ with torch.no_grad():
167
+ D = model.get_colbert_embeddings([passage], DOCUMENT_TYPE_ID)
168
+ DocPool = D.view(-1,D.size(-1)) if DocPool is None else torch.cat((DocPool, D.view(-1,D.size(-1))))
169
+ passage_embeddings[passage_id] = D
170
+
171
+ query_embeddings = dict()
172
+ Doc_Mean = DocPool.mean(dim=0)
173
+ Doc_Stdev = DocPool.std(dim=0)
174
+ del DocPool
175
+ print(f'{keyword} Embedding queries ...', flush=True)
176
+ with progressbar.ProgressBar(max_value=len(all_queries), redirect_stdout=True) as bar:
177
+ for itr, (query_id, query) in enumerate(all_queries.items()):
178
+ if (itr % 1000) == 0:
179
+ bar.update(itr)
180
+ query.trim(508)
181
+ with torch.no_grad():
182
+ Q = model.get_colbert_embeddings([query], QUESTION_TYPE_ID)
183
+ QueryPool = Q.view(-1, Q.size(-1)) if QueryPool is None else torch.cat((QueryPool, Q.view(-1, Q.size(-1))))
184
+ query_embeddings[query_id] = Q
185
+
186
+ Query_Mean = QueryPool.mean(dim=0)
187
+ Query_Stdev = QueryPool.std(dim=0)
188
+ del QueryPool
189
+
190
+ dev_triples = f'{DATA_DIR}/kinya-ag-retrieval/rw_ag_retrieval_qpntriplets_dev.tsv'
191
+ test_triples = f'{DATA_DIR}/kinya-ag-retrieval/rw_ag_retrieval_qpntriplets_test.tsv'
192
+
193
+ EVAL_SETS = [('DEV', dev_triples),
194
+ ('TEST', test_triples)]
195
+
196
+ for eval_set_name, eval_qpn_triples in EVAL_SETS:
197
+ eval_query_to_passage_ids = {(line.split('\t')[0]): (line.split('\t')[1]) for line in read_lines(eval_qpn_triples)}
198
+ Top = [1, 5, 10, 20, 30]
199
+ TopAcc = [0.0 for _ in Top]
200
+ MTop = [5, 10, 20, 30]
201
+ MRR = [0.0 for _ in MTop]
202
+ Total = 0.0
203
+ for itr,(query_id,target_doc_id) in enumerate(eval_query_to_passage_ids.items()):
204
+ query = all_queries[query_id]
205
+ with torch.no_grad():
206
+ Q = model.get_colbert_embeddings([query], QUESTION_TYPE_ID)
207
+ Q = (Q - Query_Mean) / Query_Stdev
208
+ Q = F.normalize(Q, p=2, dim=2)
209
+ results = []
210
+ for doc_id,D in passage_embeddings.items():
211
+ D = (D - Doc_Mean) / Doc_Stdev
212
+ D = F.normalize(D, p=2, dim=2)
213
+ with torch.no_grad():
214
+ score = model.pairwise_score(Q,D).squeeze().item()
215
+ score = score / Q.size(1)
216
+ results.append((score, doc_id))
217
+ Total += 1.0
218
+ results = sorted(results, key=lambda x: x[0], reverse=True)
219
+ for i, t in enumerate(Top):
220
+ TopAcc[i] += (1.0 if (target_doc_id in {idx for sc, idx in results[:t]}) else 0.0)
221
+ for i, t in enumerate(MTop):
222
+ top_rr = [(1 / (i + 1)) for i, (sc, idx) in enumerate(results[:t]) if idx == target_doc_id]
223
+ MRR[i] += (top_rr[0] if (len(top_rr) > 0) else 0.0)
224
+ print(f'-------------------------------------------------------------------------------------------------')
225
+ for i, t in enumerate(Top):
226
+ print(f'@{eval_set_name} Final {keyword}-{args.colbert_embedding_dim} kinya-ag-retrieval {eval_set_name} Set Top#{t} Accuracy:',
227
+ f'{(100.0 * TopAcc[i] / Total): .1f}% ({TopAcc[i]:.0f} / {Total:.0f})')
228
+ for i, t in enumerate(MTop):
229
+ print(f'@{eval_set_name} Final {keyword}-{args.colbert_embedding_dim} kinya-ag-retrieval {eval_set_name} Set MRR@{t}:',
230
+ f'{(100.0 * MRR[i] / Total): .1f}% ({MRR[i]:.0f} / {Total:.0f})')
231
+ print(f'-------------------------------------------------------------------------------------------------', flush=True)
232
+
233
+
234
+ ```
235
+
236
+
237
+ ### 4. Evaluating pre-trained RAGatouille ColBERT model on ["C4IR-RW/kinya-ag-retrieval"](https://huggingface.co/datasets/C4IR-RW/kinya-ag-retrieval)
238
+
239
+
240
+ ```python
241
+ from deepkin.utils.misc_functions import read_lines
242
+ from ragatouille import RAGPretrainedModel
243
+
244
+ keyword = 'agai-colbert-10000'
245
+ print(f'Evaluating {keyword} ...', flush=True)
246
+ qa_query_id = 'kinya-ag-retrieval/rw_ag_retrieval_query_id.txt'
247
+ qa_query_text = 'kinya-ag-retrieval/rw_ag_retrieval_query_text.txt'
248
+
249
+ all_queries = {idx: txt for idx, txt in zip(read_lines(qa_query_id), read_lines(qa_query_text))}
250
+
251
+ print(f'Got: {len(all_queries)} queries', flush=True)
252
+
253
+ RAG = RAGPretrainedModel.from_index(f'ragatouille-kinya-colbert/indexes/agai-colbert-10000/')
254
+
255
+ dev_triples = 'kinya-ag-retrieval/rw_ag_retrieval_qpntriplets_dev.tsv'
256
+ test_triples = 'kinya-ag-retrieval/rw_ag_retrieval_qpntriplets_test.tsv'
257
+
258
+ EVAL_SETS = [('DEV', dev_triples),
259
+ ('TEST', test_triples)]
260
+
261
+ for eval_set_name, eval_qpn_triples in EVAL_SETS:
262
+ eval_query_to_passage_ids = {(line.split('\t')[0]): (line.split('\t')[1]) for line in
263
+ read_lines(eval_qpn_triples)}
264
+ Top = [1, 5, 10, 20, 30]
265
+ TopAcc = [0.0 for _ in Top]
266
+ MTop = [5, 10, 20, 30]
267
+ MRR = [0.0 for _ in MTop]
268
+ Total = 0.0
269
+ for itr, (query_id, target_doc_id) in enumerate(eval_query_to_passage_ids.items()):
270
+ query = all_queries[query_id]
271
+ results = RAG.search(query=query, k=max(max(50, max(Top)), max(MTop)))
272
+ results = [(d['score'],d['document_id']) for d in results]
273
+ Total += 1.0
274
+ results = sorted(results, key=lambda x: x[0], reverse=True)
275
+ for i, t in enumerate(Top):
276
+ TopAcc[i] += (1.0 if (target_doc_id in {idx for sc, idx in results[:t]}) else 0.0)
277
+ for i, t in enumerate(MTop):
278
+ top_rr = [(1 / (i + 1)) for i, (sc, idx) in enumerate(results[:t]) if idx == target_doc_id]
279
+ MRR[i] += (top_rr[0] if (len(top_rr) > 0) else 0.0)
280
+ print(f'-------------------------------------------------------------------------------------------------')
281
+ for i, t in enumerate(Top):
282
+ print(f'@{eval_set_name} Final {keyword} kinya-ag-retrieval {eval_set_name} Set Top#{t} Accuracy:',
283
+ f'{(100.0 * TopAcc[i] / Total): .1f}% ({TopAcc[i]:.0f} / {Total:.0f})')
284
+ for i, t in enumerate(MTop):
285
+ print(f'@{eval_set_name} Final {keyword} kinya-ag-retrieval {eval_set_name} Set MRR@{t}:',
286
+ f'{(100.0 * MRR[i] / Total): .1f}% ({MRR[i]:.0f} / {Total:.0f})')
287
+ print(f'-------------------------------------------------------------------------------------------------',
288
+ flush=True)
289
+
290
+
291
+ ```
292
+
293
+ ## References
294
+
295
+ [1] Antoine Nzeyimana and Andre Niyongabo Rubungo. 2022. [KinyaBERT: a Morphology-aware Kinyarwanda Language Model](https://aclanthology.org/2022.acl-long.367/). In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5347–5363, Dublin, Ireland. Association for Computational Linguistics.
296
+
297
+ [2] Antoine Nzeyimana, and Andre Niyongabo Rubungo. 2025. [KinyaColBERT: A Lexically Grounded Retrieval Model for Low-Resource Retrieval-Augmented Generation](https://arxiv.org/abs/2507.03241). arXiv preprint arXiv:2507.03241.
298
+
299
+
300
+ ## License
301
+
302
+ This model is licensed under the [Creative Commons Attribution 4.0 International License (CC-BY 4.0)](https://creativecommons.org/licenses/by/4.0/).
303
+
304
+ **Attribution:** Please attribute this work to C4IR Rwanda and KiNLP.
kinya_colbert_large_rw_ag_retrieval_finetuned_512D.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fd66f10c8b11480a6453d21adc9d0b77977ca638b762c4b78d5767af7e7b4a34
3
+ size 1460919402
kinyabert_base_pretrained.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5273a0dcbd2b8d533aecd269af167528bf4d40ead8d75909401b73866273a29a
3
+ size 1288602328
kinyabert_large_pretrained.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c8a6539d9b1a5bdd9408980c2cc6c38da5f7993bfcc5f41768d10021c2341acc
3
+ size 4402338492
ragatouille-kinya-colbert/checkpoints/colbert-10000/artifact.metadata ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "query_token_id": "[unused0]",
3
+ "doc_token_id": "[unused1]",
4
+ "query_token": "[Q]",
5
+ "doc_token": "[D]",
6
+ "ncells": null,
7
+ "centroid_score_threshold": null,
8
+ "ndocs": null,
9
+ "load_index_with_mmap": false,
10
+ "index_path": null,
11
+ "index_bsize": 64,
12
+ "nbits": 8,
13
+ "kmeans_niters": 4,
14
+ "resume": false,
15
+ "pool_factor": 1,
16
+ "clustering_mode": "hierarchical",
17
+ "protected_tokens": 0,
18
+ "similarity": "cosine",
19
+ "bsize": 16,
20
+ "accumsteps": 1,
21
+ "lr": 5e-6,
22
+ "maxsteps": 500000,
23
+ "save_every": 5941,
24
+ "warmup": 5941,
25
+ "warmup_bert": null,
26
+ "relu": false,
27
+ "nway": 2,
28
+ "use_ib_negatives": true,
29
+ "reranker": false,
30
+ "distillation_alpha": 1.0,
31
+ "ignore_scores": false,
32
+ "model_name": "AfroColBERT",
33
+ "query_maxlen": 32,
34
+ "attend_to_mask_tokens": false,
35
+ "interaction": "colbert",
36
+ "dim": 1024,
37
+ "doc_maxlen": 512,
38
+ "mask_punctuation": true,
39
+ "checkpoint": "Davlan\/bert-base-multilingual-cased-finetuned-kinyarwanda",
40
+ "triples": "\/mnt\/DATA\/AfroColBERT\/data\/triples.train.colbert.jsonl",
41
+ "collection": "\/mnt\/DATA\/AfroColBERT\/data\/corpus.train.colbert.tsv",
42
+ "queries": "\/mnt\/DATA\/AfroColBERT\/data\/queries.train.colbert.tsv",
43
+ "index_name": null,
44
+ "overwrite": false,
45
+ "root": ".ragatouille\/",
46
+ "experiment": "colbert",
47
+ "index_root": null,
48
+ "name": "2025-05\/23\/11.52.12",
49
+ "rank": 0,
50
+ "nranks": 2,
51
+ "amp": true,
52
+ "gpus": 2,
53
+ "avoid_fork_if_possible": false,
54
+ "meta": {
55
+ "hostname": "nzeyi-x670e-e",
56
+ "current_datetime": "May 23, 2025 ; 12:47PM EDT (-0400)",
57
+ "cmd": "\/home\/nzeyi\/projects\/nzeyi\/kinlp\/flexkin\/scripts\/flex\/train\/colbert\/ragatouille_ag_train.py",
58
+ "version": "colbert-v0.4"
59
+ }
60
+ }
ragatouille-kinya-colbert/checkpoints/colbert-10000/config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "Davlan/bert-base-multilingual-cased-finetuned-kinyarwanda",
3
+ "architectures": [
4
+ "HF_ColBERT"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "directionality": "bidi",
9
+ "gradient_checkpointing": false,
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 768,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 3072,
15
+ "layer_norm_eps": 1e-12,
16
+ "max_position_embeddings": 512,
17
+ "model_type": "bert",
18
+ "num_attention_heads": 12,
19
+ "num_hidden_layers": 12,
20
+ "pad_token_id": 0,
21
+ "pooler_fc_size": 768,
22
+ "pooler_num_attention_heads": 12,
23
+ "pooler_num_fc_layers": 3,
24
+ "pooler_size_per_head": 128,
25
+ "pooler_type": "first_token_transform",
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.48.3",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 119547
32
+ }
ragatouille-kinya-colbert/checkpoints/colbert-10000/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:34757553ef11e93fad8b6fcfaf4794bf59e87a3cfbecddf1be37337aac062f61
3
+ size 714582952
ragatouille-kinya-colbert/checkpoints/colbert-10000/special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
ragatouille-kinya-colbert/checkpoints/colbert-10000/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
ragatouille-kinya-colbert/checkpoints/colbert-10000/tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": false,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
ragatouille-kinya-colbert/checkpoints/colbert-10000/vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
ragatouille-kinya-colbert/indexes/agai-colbert-10000/0.codes.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:567d4c3eddf91b66651c8346d9acbbd0015c8b25aa7c13bcb5331791e5531bbb
3
+ size 345308
ragatouille-kinya-colbert/indexes/agai-colbert-10000/0.metadata.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "passage_offset": 0,
3
+ "num_passages": 493,
4
+ "num_embeddings": 86043,
5
+ "embedding_offset": 0
6
+ }
ragatouille-kinya-colbert/indexes/agai-colbert-10000/0.residuals.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:24a63ee080fa49edc681773664bc192f30902201d556b19d4d8eb64448b0fb74
3
+ size 44055216
ragatouille-kinya-colbert/indexes/agai-colbert-10000/1.codes.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3e8bc06c78f227ce6eb8f98e1625118209b2291daab0faa0aec2064539266c5b
3
+ size 427996
ragatouille-kinya-colbert/indexes/agai-colbert-10000/1.metadata.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "passage_offset": 493,
3
+ "num_passages": 491,
4
+ "num_embeddings": 106705,
5
+ "embedding_offset": 86043
6
+ }
ragatouille-kinya-colbert/indexes/agai-colbert-10000/1.residuals.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:82ced4a7b272d593ea2c3994292fc138b6471e3aae0053e723f7135721f87af1
3
+ size 54634160
ragatouille-kinya-colbert/indexes/agai-colbert-10000/avg_residual.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3f441f55a6baeb9c35624e114de8554e464147a266f9dace6b240592dbf5fab3
3
+ size 1205
ragatouille-kinya-colbert/indexes/agai-colbert-10000/buckets.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e5067854deaf83ee47a09e91163bc26a09a7b2f69ad114bcbe9d3172ec64df4
3
+ size 1432
ragatouille-kinya-colbert/indexes/agai-colbert-10000/centroids.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:15d6d097c1ed2b23816f2c1992c4a974d94d098de7221082cb44ce693dc27adb
3
+ size 8389798
ragatouille-kinya-colbert/indexes/agai-colbert-10000/collection.json ADDED
The diff for this file is too large to render. See raw diff
 
ragatouille-kinya-colbert/indexes/agai-colbert-10000/doclens.0.json ADDED
@@ -0,0 +1 @@
 
 
1
+ [134,456,454,450,127,318,46,181,195,352,127,353,46,280,66,164,74,91,435,122,171,108,356,186,73,97,128,126,101,50,64,83,160,107,73,84,85,17,56,17,112,70,17,83,17,118,114,117,117,56,47,98,179,107,153,146,24,64,319,445,253,98,337,272,98,344,85,101,471,87,112,123,172,131,104,97,472,384,168,195,92,18,23,167,233,70,180,77,132,135,66,202,208,375,186,173,391,211,83,294,122,59,422,92,213,62,58,32,98,118,107,66,431,98,122,39,178,245,139,73,138,222,270,87,109,180,187,213,282,343,308,275,470,289,28,82,91,305,117,76,215,31,41,70,61,196,95,32,131,206,125,130,137,117,183,238,132,147,65,109,109,77,76,122,158,159,164,44,89,106,83,73,142,66,131,198,146,46,163,147,73,164,191,54,154,129,94,85,189,71,166,128,69,66,100,44,99,106,163,135,109,194,61,36,74,95,117,25,40,53,278,48,40,87,130,67,91,110,62,106,114,91,131,134,57,96,56,50,260,184,203,469,137,173,130,298,225,75,107,161,116,235,418,122,65,129,338,168,420,156,43,464,465,246,454,463,380,323,479,474,127,129,114,107,426,315,431,353,162,237,52,50,78,22,77,118,260,206,283,78,125,256,101,465,276,169,215,230,266,160,95,358,277,129,468,249,221,81,31,42,84,188,95,189,158,124,153,204,145,158,102,427,287,76,162,430,242,344,135,129,121,132,120,155,48,81,470,420,300,400,166,118,112,66,187,186,228,314,153,116,208,138,132,54,83,469,316,240,279,95,338,280,335,263,227,294,271,99,169,90,116,160,120,299,113,330,401,212,130,252,271,345,260,299,269,314,54,395,138,218,105,246,88,453,122,116,98,80,191,58,164,124,140,87,149,183,104,147,325,16,70,189,197,36,106,37,26,67,162,76,121,132,71,138,278,216,208,208,87,79,16,91,73,92,252,123,167,67,54,119,173,333,200,85,319,142,226,216,237,366,311,366,315,259,193,218,119,232,346,94,155,93,442,243,198,149,270,190,386,199,207,292,287,138,162,226,205,179,105,105,230,69,91,74,76,332,144,221,133,75,467,285,196,96,101,366,331,443,99,383,181,45,176]
ragatouille-kinya-colbert/indexes/agai-colbert-10000/doclens.1.json ADDED
@@ -0,0 +1 @@
 
 
1
+ [196,116,210,274,205,285,379,179,176,166,110,217,199,200,290,111,376,198,113,255,196,283,198,226,214,218,197,241,80,380,170,198,185,354,238,320,259,280,94,62,161,265,296,76,196,342,260,207,291,135,237,233,77,309,364,486,138,124,177,52,164,184,218,173,201,193,72,150,230,235,379,196,322,351,384,171,284,29,111,196,24,178,266,232,23,362,73,173,37,195,421,246,360,204,326,390,278,146,142,188,433,305,278,280,84,36,17,55,159,69,111,257,147,327,359,339,344,17,275,287,242,156,210,327,159,125,213,54,170,160,310,436,165,76,162,197,480,65,275,220,269,140,131,186,167,305,136,21,258,90,13,16,102,155,208,265,74,362,94,180,201,23,49,20,21,14,128,56,23,141,33,14,263,179,103,427,205,174,229,238,306,274,205,167,116,483,86,403,84,242,321,477,19,214,252,466,435,85,231,185,313,129,314,77,81,474,449,475,370,154,462,394,245,139,88,86,118,121,19,200,148,38,82,104,146,71,143,421,174,151,94,125,180,218,256,71,343,271,69,481,44,126,36,86,140,84,102,249,25,55,94,153,190,114,304,123,61,137,243,407,90,34,37,14,36,32,36,32,35,32,47,58,55,48,47,22,143,478,79,129,85,221,162,294,77,92,103,66,224,76,339,94,24,68,234,77,116,100,304,148,236,86,450,111,84,205,122,255,462,246,245,345,211,460,172,118,185,14,106,201,173,18,144,97,96,27,376,169,411,476,456,212,222,221,243,448,479,482,363,210,201,134,212,238,224,230,192,169,347,460,465,87,285,391,288,311,320,245,389,444,460,472,312,355,316,196,387,84,95,198,165,404,481,223,161,147,223,471,86,206,366,197,467,414,351,417,215,203,278,453,396,295,277,162,148,282,252,149,452,326,72,84,329,304,383,474,232,447,429,195,150,190,373,83,119,106,34,178,112,118,236,142,124,225,252,256,359,159,52,63,61,109,29,137,131,186,126,197,476,139,74,451,84,124,101,467,474,475,467,471,276,474,473,463,314,180,243,288,469,416,202,399,450,455,480,140,339,344,331,241,160,270,225,212,318,97,275,290,340,358,272,312,146,201,486,435,289,170,293,188,340]
ragatouille-kinya-colbert/indexes/agai-colbert-10000/ivf.pid.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3a654d8022b9bb5721f0e30696ed494b8dd538e19e6eb02b9a9e83251569ba0c
3
+ size 127896
ragatouille-kinya-colbert/indexes/agai-colbert-10000/metadata.json ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "config":{
3
+ "query_token_id":"[unused0]",
4
+ "doc_token_id":"[unused1]",
5
+ "query_token":"[Q]",
6
+ "doc_token":"[D]",
7
+ "ncells":null,
8
+ "centroid_score_threshold":null,
9
+ "ndocs":null,
10
+ "load_index_with_mmap":false,
11
+ "index_path":null,
12
+ "index_bsize":32,
13
+ "nbits":4,
14
+ "kmeans_niters":20,
15
+ "resume":false,
16
+ "pool_factor":1,
17
+ "clustering_mode":"hierarchical",
18
+ "protected_tokens":0,
19
+ "similarity":"cosine",
20
+ "bsize":64,
21
+ "accumsteps":1,
22
+ "lr":0.000005,
23
+ "maxsteps":500000,
24
+ "save_every":5941,
25
+ "warmup":5941,
26
+ "warmup_bert":null,
27
+ "relu":false,
28
+ "nway":2,
29
+ "use_ib_negatives":true,
30
+ "reranker":false,
31
+ "distillation_alpha":1.0,
32
+ "ignore_scores":false,
33
+ "model_name":"AfroColBERT",
34
+ "query_maxlen":32,
35
+ "attend_to_mask_tokens":false,
36
+ "interaction":"colbert",
37
+ "dim":1024,
38
+ "doc_maxlen":512,
39
+ "mask_punctuation":true,
40
+ "checkpoint":"/mnt/DATA/AfroColBERT/.ragatouille/colbert/none/2025-05/23/11.52.12/checkpoints/colbert-10000/",
41
+ "triples":"/mnt/DATA/AfroColBERT/data/triples.train.colbert.jsonl",
42
+ "collection":[
43
+ "list with 984 elements starting with...",
44
+ [
45
+ "Inkoko irwaye irangwa n' ibi bikurikira : Iyo indwara yateye mu nkoko , ibigaragaza ubuzima bwazo bwiza birabura cyangwa bigahinduka . Usanga inkoko zigunze , zikonje , zisinzira , ndetse zimwe zahiniye amajosi mu mababa . Amababa n' ibirokoroko byijimye . Akenshi ntizirya cyangwa ugasanga ari imwe imwe itoratora . Hari ubwo usanga zose zihitwa ; hakaba n' ubwo zihitwa amaraso .",
46
+ "Uburyo indwara y' ubushita yandura mu nkoko : Ino ndwara iboneka mu nkoko zose ( into n' inkuru ) . Ni indwara mbi cyane iterwa na virusi . Inkoko zirwaye zanduza izindi . Ibikoresho byanduye nabyo bishobora kwanduza inkoko ino ndwara . Ibimenyetso by' indwara y' ubushita ( Fowl pox ) mu nkoko : Kugagara amaguru , ijosi n' amababa . Utubyimba duto kumaguru ; Igisunzu mumaso ; Kugira umuriro ; Udusebe mu kanwa no mu muhogo ; Gutakaza ibiro ( kunanuka / guhorota ) Kubyimba kwaho amababa atereye ; Ubuhumyi ; Guhombana kw' agatorero ; Kutarya no kutanywa ; Igabanuka ry' amagi ndetse n' impfu za hato nahato ; Ni iki wakora ngo ukumire indwara y' ubushita ( Fowl pox ) mu nkoko ; Kubera ko urukingo rw' iyi ndwara rutangwa ku mishwi ikivuka ( umunsi umwe ) ; umworozi asabwa kugura imishwi mu ituragiro ryizewe azi ko zayikingiwe ( akanahabwa icyemezo cyuko zakingiwe ) . Nta bundi buryo buhari bwo kuyirinda . Kwirinda no kuvura indwara y' Ubushita ( Fowl pox ) mu nkoko : Iyi ndwara nta muti igira . Kuyirinda ukurikiza amabwiriza yose y' isuku nicyo gisubizo cyonyine . Irinde kujya mu biraro by' inkoko bifite iyi ndwara . Mu gihe uguze izindi nkoko ugomba kuzishyira mu kato byibuze mu gihe cy' amezi 2 kugira ngo urebe niba nta burwayi zifite . Irinde imibu n' amasazi kuko bishobora kwanduza inkoko iyi ndwara .",
47
+ "Gukingira inkoko z' inyama ; Igihe inkoko zimaze iminsi 2 : Inkoko zikorerwa urukingo rwa Umuraramo ( New Castle ) . Igipimo cy' umuti ni New Castle HB 1 : agacupa k' inkingo 1000 bashyira muri litiro 20 z' amazi meza . Igihe inkoko zimaze iminsi 1 kugeza kuri 4 : Inkoko zihabwa Vitamine ( Anti - stress ) . Igipimo cy' umuti ni AMINOVIT : garama 1 bayivanga na litiro 1 y' amazi meza . Igihe inkoko zimaze iminsi 5 - 7 : Inkoko zikorerwa gukumira Kogusidiyoze . Igipimo cy' umuti ni VETACOX : garama 1 bayivanga na litiro 2 z' amazi meza . Igihe inkoko zimaze iminsi 7 : Inkoko zikorerwa urukingo rwa GUMBORO . Igipimo cy' umuti ni Cevac Gumbo L : agacupa k' inkingo 1000 bashyira muri litiro 20 z' amazi meza . Igihe inkoko zimaze iminsi 14 : Inkoko zikorerwa urukingo rwa Umuraramo ( New Castle ) . Igipimo cy' umuti ni Newcastle La sota : agacupa ka doze 1000 bashyira muri litiro 20 z' amazi meza . 6 . Igihe inkoko zimaze iminsi 17 - 19 : Inkoko zikorerwa gukumira Kogusidiyoze . Igipimo cy' umuti ni Amprolium : garama 1 bayivanga na litiro 1 y' amazi meza . 7 . Igihe inkoko zimaze iminsi 21 : Inkoko zikorerwa urukingo rwa GUMBORO . Igipimo cy' umuti ni Cevac Gumbo L : agacupa ka doze 1000 bashyira muri litiro 20 z' amazi meza ; Icyitonderwa : Igihe indwara igaragaye mu nkoko , umworozi agomba guhita ahamagara umuganga w' amatungo uri hafi wemewe . Ubutabazi bw' ibanze : Mu gihe ugitegereje umuganga w' amatungo nibyiza kuziha amavitamine , imyunyungugu , amazi meza n' indyo yuzuye no kuzitaho ."
48
+ ]
49
+ ],
50
+ "queries":"/mnt/DATA/AfroColBERT/data/queries.train.colbert.tsv",
51
+ "index_name":"agai-colbert-10000",
52
+ "overwrite":false,
53
+ "root":".ragatouille/",
54
+ "experiment":"colbert",
55
+ "index_root":null,
56
+ "name":"2025-05/23/18.01.47",
57
+ "rank":0,
58
+ "nranks":2,
59
+ "amp":true,
60
+ "gpus":2,
61
+ "avoid_fork_if_possible":false
62
+ },
63
+ "num_chunks":2,
64
+ "num_partitions":4096,
65
+ "num_embeddings":192748,
66
+ "avg_doclen":195.8821138211,
67
+ "RAGatouille":{
68
+ "index_config":{
69
+ "index_type":"PLAID",
70
+ "index_name":"agai-colbert-10000"
71
+ }
72
+ }
73
+ }
ragatouille-kinya-colbert/indexes/agai-colbert-10000/pid_docid_map.json ADDED
@@ -0,0 +1,986 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0":"A000T639",
3
+ "1":"A001T643",
4
+ "2":"A002T641",
5
+ "3":"A003T640",
6
+ "4":"A004T637",
7
+ "5":"A005T636",
8
+ "6":"A006T634",
9
+ "7":"A007T633",
10
+ "8":"A008T632",
11
+ "9":"A009T628",
12
+ "10":"A010T627",
13
+ "11":"A011T624",
14
+ "12":"A012T621",
15
+ "13":"A013T619",
16
+ "14":"A014T618",
17
+ "15":"A015T614",
18
+ "16":"A016T612",
19
+ "17":"A017T611",
20
+ "18":"A018T610",
21
+ "19":"A019T620",
22
+ "20":"A020T001",
23
+ "21":"A021T563",
24
+ "22":"A022T565",
25
+ "23":"A023T580",
26
+ "24":"A024T578",
27
+ "25":"A025T602",
28
+ "26":"A026T603",
29
+ "27":"A027T575",
30
+ "28":"A028T002",
31
+ "29":"A029T003",
32
+ "30":"A030T004",
33
+ "31":"A031T005",
34
+ "32":"A032T006",
35
+ "33":"A033T007",
36
+ "34":"A034T008",
37
+ "35":"A035T009",
38
+ "36":"A036T011",
39
+ "37":"A037T014",
40
+ "38":"A038T023",
41
+ "39":"A039T033",
42
+ "40":"A040T026",
43
+ "41":"A041T027",
44
+ "42":"A042T030",
45
+ "43":"A043T025",
46
+ "44":"A044T010",
47
+ "45":"A045T017",
48
+ "46":"A046T018",
49
+ "47":"A047T020",
50
+ "48":"A048T021",
51
+ "49":"A049T022",
52
+ "50":"A050T024",
53
+ "51":"A051T035",
54
+ "52":"A052T036",
55
+ "53":"A053T037",
56
+ "54":"A054T040",
57
+ "55":"A055T041",
58
+ "56":"A056T043",
59
+ "57":"A057T044",
60
+ "58":"A058T046",
61
+ "59":"A059T047",
62
+ "60":"A060T048",
63
+ "61":"A061T049",
64
+ "62":"A062T063",
65
+ "63":"A063T052",
66
+ "64":"A064T053",
67
+ "65":"A065T050",
68
+ "66":"A066T054",
69
+ "67":"A067T056",
70
+ "68":"A068T055",
71
+ "69":"A069T057",
72
+ "70":"A070T058",
73
+ "71":"A071T059",
74
+ "72":"A072T060",
75
+ "73":"A073T061",
76
+ "74":"A074T062",
77
+ "75":"A075T064",
78
+ "76":"A076T066",
79
+ "77":"A077T067",
80
+ "78":"A078T068",
81
+ "79":"A079T069",
82
+ "80":"A080T045",
83
+ "81":"A081T644",
84
+ "82":"A082T645",
85
+ "83":"A083T070",
86
+ "84":"A084T646",
87
+ "85":"A085T647",
88
+ "86":"A086T648",
89
+ "87":"A087T071",
90
+ "88":"A088T649",
91
+ "89":"A089T072",
92
+ "90":"A090T650",
93
+ "91":"A091T073",
94
+ "92":"A092T074",
95
+ "93":"A093T075",
96
+ "94":"A094T076",
97
+ "95":"A095T077",
98
+ "96":"A096T078",
99
+ "97":"A097T079",
100
+ "98":"A098T080",
101
+ "99":"A099T651",
102
+ "100":"A100T652",
103
+ "101":"A101T081",
104
+ "102":"A102T082",
105
+ "103":"A103T084",
106
+ "104":"A104T085",
107
+ "105":"A105T086",
108
+ "106":"A106T087",
109
+ "107":"A107T653",
110
+ "108":"A108T654",
111
+ "109":"A109T019",
112
+ "110":"A110T088",
113
+ "111":"A111T089",
114
+ "112":"A112T090",
115
+ "113":"A113T091",
116
+ "114":"A114T092",
117
+ "115":"A115T093",
118
+ "116":"A116T094",
119
+ "117":"A117T095",
120
+ "118":"A118T096",
121
+ "119":"A119T097",
122
+ "120":"A120T098",
123
+ "121":"A121T100",
124
+ "122":"A122T102",
125
+ "123":"A123T103",
126
+ "124":"A124T104",
127
+ "125":"A125T105",
128
+ "126":"A126T106",
129
+ "127":"A127T107",
130
+ "128":"A128T108",
131
+ "129":"A129T109",
132
+ "130":"A130T110",
133
+ "131":"A131T111",
134
+ "132":"A132T112",
135
+ "133":"A133T113",
136
+ "134":"A134T114",
137
+ "135":"A135T115",
138
+ "136":"A136T116",
139
+ "137":"A137T117",
140
+ "138":"A138T118",
141
+ "139":"A139T119",
142
+ "140":"A140T120",
143
+ "141":"A141T121",
144
+ "142":"A142T123",
145
+ "143":"A143T125",
146
+ "144":"A144T124",
147
+ "145":"A145T126",
148
+ "146":"A146T127",
149
+ "147":"A147T099",
150
+ "148":"A148T128",
151
+ "149":"A149T129",
152
+ "150":"A150T130",
153
+ "151":"A151T131",
154
+ "152":"A152T132",
155
+ "153":"A153T133",
156
+ "154":"A154T134",
157
+ "155":"A155T135",
158
+ "156":"A156T136",
159
+ "157":"A157T137",
160
+ "158":"A158T138",
161
+ "159":"A159T139",
162
+ "160":"A160T140",
163
+ "161":"A161T141",
164
+ "162":"A162T143",
165
+ "163":"A163T144",
166
+ "164":"A164T145",
167
+ "165":"A165T146",
168
+ "166":"A166T147",
169
+ "167":"A167T148",
170
+ "168":"A168T149",
171
+ "169":"A169T150",
172
+ "170":"A170T151",
173
+ "171":"A171T152",
174
+ "172":"A172T153",
175
+ "173":"A173T154",
176
+ "174":"A174T155",
177
+ "175":"A175T156",
178
+ "176":"A176T157",
179
+ "177":"A177T158",
180
+ "178":"A178T159",
181
+ "179":"A179T160",
182
+ "180":"A180T161",
183
+ "181":"A181T162",
184
+ "182":"A182T163",
185
+ "183":"A183T164",
186
+ "184":"A184T165",
187
+ "185":"A185T166",
188
+ "186":"A186T167",
189
+ "187":"A187T168",
190
+ "188":"A188T169",
191
+ "189":"A189T170",
192
+ "190":"A190T171",
193
+ "191":"A191T172",
194
+ "192":"A192T173",
195
+ "193":"A193T174",
196
+ "194":"A194T175",
197
+ "195":"A195T176",
198
+ "196":"A196T177",
199
+ "197":"A197T178",
200
+ "198":"A198T179",
201
+ "199":"A199T180",
202
+ "200":"A200T181",
203
+ "201":"A201T182",
204
+ "202":"A202T183",
205
+ "203":"A203T184",
206
+ "204":"A204T186",
207
+ "205":"A205T187",
208
+ "206":"A206T185",
209
+ "207":"A207T188",
210
+ "208":"A208T189",
211
+ "209":"A209T190",
212
+ "210":"A210T191",
213
+ "211":"A211T192",
214
+ "212":"A212T193",
215
+ "213":"A213T194",
216
+ "214":"A214T195",
217
+ "215":"A215T196",
218
+ "216":"A216T197",
219
+ "217":"A217T198",
220
+ "218":"A218T199",
221
+ "219":"A219T200",
222
+ "220":"A220T201",
223
+ "221":"A221T202",
224
+ "222":"A222T203",
225
+ "223":"A223T204",
226
+ "224":"A224T206",
227
+ "225":"A225T207",
228
+ "226":"A226T208",
229
+ "227":"A227T209",
230
+ "228":"A228T210",
231
+ "229":"A229T655",
232
+ "230":"A230T656",
233
+ "231":"A231T657",
234
+ "232":"A232T658",
235
+ "233":"A233T659",
236
+ "234":"A234T660",
237
+ "235":"A235T661",
238
+ "236":"A236T662",
239
+ "237":"A237T663",
240
+ "238":"A238T664",
241
+ "239":"A239T665",
242
+ "240":"A240T667",
243
+ "241":"A241T668",
244
+ "242":"A242T669",
245
+ "243":"A243T670",
246
+ "244":"A244T671",
247
+ "245":"A245T672",
248
+ "246":"A246T673",
249
+ "247":"A247T674",
250
+ "248":"A248T675",
251
+ "249":"A249T676",
252
+ "250":"A250T677",
253
+ "251":"A251T678",
254
+ "252":"A252T683",
255
+ "253":"A253T684",
256
+ "254":"A254T685",
257
+ "255":"A255T686",
258
+ "256":"A256T687",
259
+ "257":"A257T688",
260
+ "258":"A258T689",
261
+ "259":"A259T690",
262
+ "260":"A260T691",
263
+ "261":"A261T692",
264
+ "262":"A262T693",
265
+ "263":"A263T694",
266
+ "264":"A264T695",
267
+ "265":"A265T696",
268
+ "266":"A266T697",
269
+ "267":"A267T698",
270
+ "268":"A268T699",
271
+ "269":"A269T700",
272
+ "270":"A270T701",
273
+ "271":"A271T702",
274
+ "272":"A272T703",
275
+ "273":"A273T704",
276
+ "274":"A274T705",
277
+ "275":"A275T706",
278
+ "276":"A276T707",
279
+ "277":"A277T708",
280
+ "278":"A278T709",
281
+ "279":"A279T710",
282
+ "280":"A280T711",
283
+ "281":"A281T712",
284
+ "282":"A282T713",
285
+ "283":"A283T714",
286
+ "284":"A284T715",
287
+ "285":"A285T716",
288
+ "286":"A286T717",
289
+ "287":"A287T718",
290
+ "288":"A288T719",
291
+ "289":"A289T720",
292
+ "290":"A290T721",
293
+ "291":"A291T722",
294
+ "292":"A292T723",
295
+ "293":"A293T724",
296
+ "294":"A294T725",
297
+ "295":"A295T726",
298
+ "296":"A296T727",
299
+ "297":"A297T728",
300
+ "298":"A298T729",
301
+ "299":"A299T730",
302
+ "300":"A300T731",
303
+ "301":"A301T732",
304
+ "302":"A302T733",
305
+ "303":"A303T734",
306
+ "304":"A304T735",
307
+ "305":"A305T736",
308
+ "306":"A306T737",
309
+ "307":"A307T738",
310
+ "308":"A308T739",
311
+ "309":"A309T740",
312
+ "310":"A310T741",
313
+ "311":"A311T742",
314
+ "312":"A312T743",
315
+ "313":"A313T744",
316
+ "314":"A314T745",
317
+ "315":"A315T746",
318
+ "316":"A316T747",
319
+ "317":"A317T748",
320
+ "318":"A318T749",
321
+ "319":"A319T750",
322
+ "320":"A320T751",
323
+ "321":"A321T752",
324
+ "322":"A322T753",
325
+ "323":"A323T754",
326
+ "324":"A324T755",
327
+ "325":"A325T756",
328
+ "326":"A326T757",
329
+ "327":"A327T758",
330
+ "328":"A328T759",
331
+ "329":"A329T760",
332
+ "330":"A330T761",
333
+ "331":"A331T762",
334
+ "332":"A332T763",
335
+ "333":"A333T764",
336
+ "334":"A334T765",
337
+ "335":"A335T766",
338
+ "336":"A336T767",
339
+ "337":"A337T768",
340
+ "338":"A338T769",
341
+ "339":"A339T770",
342
+ "340":"A340T771",
343
+ "341":"A341T772",
344
+ "342":"A342T773",
345
+ "343":"A343T774",
346
+ "344":"A344T775",
347
+ "345":"A345T776",
348
+ "346":"A346T777",
349
+ "347":"A347T778",
350
+ "348":"A348T779",
351
+ "349":"A349T780",
352
+ "350":"A350T781",
353
+ "351":"A351T782",
354
+ "352":"A352T783",
355
+ "353":"A353T784",
356
+ "354":"A354T785",
357
+ "355":"A355T786",
358
+ "356":"A356T787",
359
+ "357":"A357T788",
360
+ "358":"A358T789",
361
+ "359":"A359T790",
362
+ "360":"A360T791",
363
+ "361":"A361T792",
364
+ "362":"A362T793",
365
+ "363":"A363T794",
366
+ "364":"A364T795",
367
+ "365":"A365T796",
368
+ "366":"A366T797",
369
+ "367":"A367T798",
370
+ "368":"A368T799",
371
+ "369":"A369T800",
372
+ "370":"A370T801",
373
+ "371":"A371T802",
374
+ "372":"A372T803",
375
+ "373":"A373T804",
376
+ "374":"A374T805",
377
+ "375":"A375T806",
378
+ "376":"A376T807",
379
+ "377":"A377T808",
380
+ "378":"A378T809",
381
+ "379":"A379T810",
382
+ "380":"A380T811",
383
+ "381":"A381T812",
384
+ "382":"A382T813",
385
+ "383":"A383T814",
386
+ "384":"A384T815",
387
+ "385":"A385T816",
388
+ "386":"A386T817",
389
+ "387":"A387T818",
390
+ "388":"A388T819",
391
+ "389":"A389T820",
392
+ "390":"A390T821",
393
+ "391":"A391T823",
394
+ "392":"A392T824",
395
+ "393":"A393T825",
396
+ "394":"A394T827",
397
+ "395":"A395T828",
398
+ "396":"A396T829",
399
+ "397":"A397T830",
400
+ "398":"A398T831",
401
+ "399":"A399T832",
402
+ "400":"A400T833",
403
+ "401":"A401T834",
404
+ "402":"A402T835",
405
+ "403":"A403T211",
406
+ "404":"A404T212",
407
+ "405":"A405T213",
408
+ "406":"A406T214",
409
+ "407":"A407T215",
410
+ "408":"A408T216",
411
+ "409":"A409T217",
412
+ "410":"A410T218",
413
+ "411":"A411T219",
414
+ "412":"A412T220",
415
+ "413":"A413T221",
416
+ "414":"A414T223",
417
+ "415":"A415T224",
418
+ "416":"A416T225",
419
+ "417":"A417T226",
420
+ "418":"A418T227",
421
+ "419":"A419T228",
422
+ "420":"A420T229",
423
+ "421":"A421T230",
424
+ "422":"A422T231",
425
+ "423":"A423T232",
426
+ "424":"A424T233",
427
+ "425":"A425T234",
428
+ "426":"A426T235",
429
+ "427":"A427T236",
430
+ "428":"A428T237",
431
+ "429":"A429T238",
432
+ "430":"A430T239",
433
+ "431":"A431T240",
434
+ "432":"A432T241",
435
+ "433":"A433T242",
436
+ "434":"A434T243",
437
+ "435":"A435T245",
438
+ "436":"A436T244",
439
+ "437":"A437T246",
440
+ "438":"A438T247",
441
+ "439":"A439T248",
442
+ "440":"A440T249",
443
+ "441":"A441T250",
444
+ "442":"A442T251",
445
+ "443":"A443T252",
446
+ "444":"A444T253",
447
+ "445":"A445T254",
448
+ "446":"A446T255",
449
+ "447":"A447T256",
450
+ "448":"A448T257",
451
+ "449":"A449T258",
452
+ "450":"A450T259",
453
+ "451":"A451T260",
454
+ "452":"A452T261",
455
+ "453":"A453T262",
456
+ "454":"A454T263",
457
+ "455":"A455T264",
458
+ "456":"A456T265",
459
+ "457":"A457T266",
460
+ "458":"A458T267",
461
+ "459":"A459T268",
462
+ "460":"A460T269",
463
+ "461":"A461T270",
464
+ "462":"A462T271",
465
+ "463":"A463T272",
466
+ "464":"A464T273",
467
+ "465":"A465T274",
468
+ "466":"A466T275",
469
+ "467":"A467T276",
470
+ "468":"A468T277",
471
+ "469":"A469T278",
472
+ "470":"A470T279",
473
+ "471":"A471T280",
474
+ "472":"A472T281",
475
+ "473":"A473T282",
476
+ "474":"A474T283",
477
+ "475":"A475T284",
478
+ "476":"A476T285",
479
+ "477":"A477T286",
480
+ "478":"A478T287",
481
+ "479":"A479T288",
482
+ "480":"A480T289",
483
+ "481":"A481T291",
484
+ "482":"A482T293",
485
+ "483":"A483T292",
486
+ "484":"A484T294",
487
+ "485":"A485T295",
488
+ "486":"A486T296",
489
+ "487":"A487T297",
490
+ "488":"A488T299",
491
+ "489":"A489T300",
492
+ "490":"A490T301",
493
+ "491":"A491T302",
494
+ "492":"A492T303",
495
+ "493":"A493T304",
496
+ "494":"A494T305",
497
+ "495":"A495T306",
498
+ "496":"A496T307",
499
+ "497":"A497T308",
500
+ "498":"A498T309",
501
+ "499":"A499T310",
502
+ "500":"A500T311",
503
+ "501":"A501T312",
504
+ "502":"A502T313",
505
+ "503":"A503T314",
506
+ "504":"A504T315",
507
+ "505":"A505T316",
508
+ "506":"A506T317",
509
+ "507":"A507T318",
510
+ "508":"A508T319",
511
+ "509":"A509T320",
512
+ "510":"A510T321",
513
+ "511":"A511T322",
514
+ "512":"A512T323",
515
+ "513":"A513T324",
516
+ "514":"A514T325",
517
+ "515":"A515T326",
518
+ "516":"A516T327",
519
+ "517":"A517T328",
520
+ "518":"A518T329",
521
+ "519":"A519T330",
522
+ "520":"A520T331",
523
+ "521":"A521T332",
524
+ "522":"A522T333",
525
+ "523":"A523T334",
526
+ "524":"A524T335",
527
+ "525":"A525T336",
528
+ "526":"A526T337",
529
+ "527":"A527T338",
530
+ "528":"A528T339",
531
+ "529":"A529T340",
532
+ "530":"A530T341",
533
+ "531":"A531T342",
534
+ "532":"A532T343",
535
+ "533":"A533T344",
536
+ "534":"A534T345",
537
+ "535":"A535T346",
538
+ "536":"A536T347",
539
+ "537":"A537T349",
540
+ "538":"A538T350",
541
+ "539":"A539T351",
542
+ "540":"A540T352",
543
+ "541":"A541T353",
544
+ "542":"A542T354",
545
+ "543":"A543T355",
546
+ "544":"A544T356",
547
+ "545":"A545T357",
548
+ "546":"A546T358",
549
+ "547":"A547T359",
550
+ "548":"A548T360",
551
+ "549":"A549T361",
552
+ "550":"A550T362",
553
+ "551":"A551T363",
554
+ "552":"A552T364",
555
+ "553":"A553T365",
556
+ "554":"A554T366",
557
+ "555":"A555T367",
558
+ "556":"A556T368",
559
+ "557":"A557T369",
560
+ "558":"A558T370",
561
+ "559":"A559T371",
562
+ "560":"A560T372",
563
+ "561":"A561T373",
564
+ "562":"A562T374",
565
+ "563":"A563T375",
566
+ "564":"A564T376",
567
+ "565":"A565T377",
568
+ "566":"A566T378",
569
+ "567":"A567T379",
570
+ "568":"A568T380",
571
+ "569":"A569T381",
572
+ "570":"A570T382",
573
+ "571":"A571T383",
574
+ "572":"A572T384",
575
+ "573":"A573T385",
576
+ "574":"A574T386",
577
+ "575":"A575T387",
578
+ "576":"A576T388",
579
+ "577":"A577T389",
580
+ "578":"A578T390",
581
+ "579":"A579T391",
582
+ "580":"A580T392",
583
+ "581":"A581T393",
584
+ "582":"A582T394",
585
+ "583":"A583T395",
586
+ "584":"A584T396",
587
+ "585":"A585T397",
588
+ "586":"A586T398",
589
+ "587":"A587T399",
590
+ "588":"A588T400",
591
+ "589":"A589T401",
592
+ "590":"A590T402",
593
+ "591":"A591T403",
594
+ "592":"A592T404",
595
+ "593":"A593T405",
596
+ "594":"A594T406",
597
+ "595":"A595T407",
598
+ "596":"A596T408",
599
+ "597":"A597T409",
600
+ "598":"A598T410",
601
+ "599":"A599T411",
602
+ "600":"A600T412",
603
+ "601":"A601T413",
604
+ "602":"A602T414",
605
+ "603":"A603T415",
606
+ "604":"A604T416",
607
+ "605":"A605T417",
608
+ "606":"A606T418",
609
+ "607":"A607T419",
610
+ "608":"A608T420",
611
+ "609":"A609T421",
612
+ "610":"A610T422",
613
+ "611":"A611T423",
614
+ "612":"A612T424",
615
+ "613":"A613T425",
616
+ "614":"A614T426",
617
+ "615":"A615T427",
618
+ "616":"A616T428",
619
+ "617":"A617T429",
620
+ "618":"A618T430",
621
+ "619":"A619T431",
622
+ "620":"A620T433",
623
+ "621":"A621T434",
624
+ "622":"A622T435",
625
+ "623":"A623T436",
626
+ "624":"A624T437",
627
+ "625":"A625T438",
628
+ "626":"A626T439",
629
+ "627":"A627T440",
630
+ "628":"A628T441",
631
+ "629":"A629T442",
632
+ "630":"A630T443",
633
+ "631":"A631T444",
634
+ "632":"A632T445",
635
+ "633":"A633T446",
636
+ "634":"A634T447",
637
+ "635":"A635T448",
638
+ "636":"A636T449",
639
+ "637":"A637T450",
640
+ "638":"A638T451",
641
+ "639":"A639T452",
642
+ "640":"A640T453",
643
+ "641":"A641T454",
644
+ "642":"A642T455",
645
+ "643":"A643T456",
646
+ "644":"A644T457",
647
+ "645":"A645T458",
648
+ "646":"A646T459",
649
+ "647":"A647T460",
650
+ "648":"A648T461",
651
+ "649":"A649T463",
652
+ "650":"A650T464",
653
+ "651":"A651T462",
654
+ "652":"A652T465",
655
+ "653":"A653T466",
656
+ "654":"A654T467",
657
+ "655":"A655T468",
658
+ "656":"A656T470",
659
+ "657":"A657T469",
660
+ "658":"A658T471",
661
+ "659":"A659T472",
662
+ "660":"A660T473",
663
+ "661":"A661T474",
664
+ "662":"A662T475",
665
+ "663":"A663T476",
666
+ "664":"A664T477",
667
+ "665":"A665T479",
668
+ "666":"A666T480",
669
+ "667":"A667T481",
670
+ "668":"A668T482",
671
+ "669":"A669T483",
672
+ "670":"A670T484",
673
+ "671":"A671T478",
674
+ "672":"A672T502",
675
+ "673":"A673T492",
676
+ "674":"A674T498",
677
+ "675":"A675T485",
678
+ "676":"A676T486",
679
+ "677":"A677T488",
680
+ "678":"A678T490",
681
+ "679":"A679T489",
682
+ "680":"A680T495",
683
+ "681":"A681T487",
684
+ "682":"A682T497",
685
+ "683":"A683T491",
686
+ "684":"A684T499",
687
+ "685":"A685T496",
688
+ "686":"A686T493",
689
+ "687":"A687T503",
690
+ "688":"A688T494",
691
+ "689":"A689T500",
692
+ "690":"A690T509",
693
+ "691":"A691T510",
694
+ "692":"A692T511",
695
+ "693":"A693T512",
696
+ "694":"A694T513",
697
+ "695":"A695T514",
698
+ "696":"A696T515",
699
+ "697":"A697T516",
700
+ "698":"A698T517",
701
+ "699":"A699T501",
702
+ "700":"A700T504",
703
+ "701":"A701T505",
704
+ "702":"A702T506",
705
+ "703":"A703T507",
706
+ "704":"A704T508",
707
+ "705":"A705T518",
708
+ "706":"A706T519",
709
+ "707":"A707T520",
710
+ "708":"A708T521",
711
+ "709":"A709T522",
712
+ "710":"A710T524",
713
+ "711":"A711T525",
714
+ "712":"A712T526",
715
+ "713":"A713T527",
716
+ "714":"A714T528",
717
+ "715":"A715T529",
718
+ "716":"A716T530",
719
+ "717":"A717T531",
720
+ "718":"A718T532",
721
+ "719":"A719T533",
722
+ "720":"A720T534",
723
+ "721":"A721T535",
724
+ "722":"A722T536",
725
+ "723":"A723T537",
726
+ "724":"A724T538",
727
+ "725":"A725T539",
728
+ "726":"A726T540",
729
+ "727":"A727T541",
730
+ "728":"A728T543",
731
+ "729":"A729T544",
732
+ "730":"A730T545",
733
+ "731":"A731T546",
734
+ "732":"A732T547",
735
+ "733":"A733T548",
736
+ "734":"A734T549",
737
+ "735":"A735T550",
738
+ "736":"A736T551",
739
+ "737":"A737T553",
740
+ "738":"A738T554",
741
+ "739":"A739T555",
742
+ "740":"A740T556",
743
+ "741":"A741T557",
744
+ "742":"A742T558",
745
+ "743":"A743T560",
746
+ "744":"A744T561",
747
+ "745":"A745T562",
748
+ "746":"A746T564",
749
+ "747":"A747T566",
750
+ "748":"A748T572",
751
+ "749":"A749T573",
752
+ "750":"A750T574",
753
+ "751":"A751T576",
754
+ "752":"A752T577",
755
+ "753":"A753T579",
756
+ "754":"A754T582",
757
+ "755":"A755T584",
758
+ "756":"A756T583",
759
+ "757":"A757T585",
760
+ "758":"A758T586",
761
+ "759":"A759T587",
762
+ "760":"A760T588",
763
+ "761":"A761T589",
764
+ "762":"A762T590",
765
+ "763":"A763T591",
766
+ "764":"A764T592",
767
+ "765":"A765T593",
768
+ "766":"A766T594",
769
+ "767":"A767T595",
770
+ "768":"A768T596",
771
+ "769":"A769T597",
772
+ "770":"A770T598",
773
+ "771":"A771T599",
774
+ "772":"A772T600",
775
+ "773":"A773T604",
776
+ "774":"A774T542",
777
+ "775":"A775T607",
778
+ "776":"A776T606",
779
+ "777":"A777T608",
780
+ "778":"A778T609",
781
+ "779":"A779T615",
782
+ "780":"A780T617",
783
+ "781":"A781T622",
784
+ "782":"A782T623",
785
+ "783":"A783T625",
786
+ "784":"A784T626",
787
+ "785":"A785T629",
788
+ "786":"A786T630",
789
+ "787":"A787T631",
790
+ "788":"A788T635",
791
+ "789":"A789T638",
792
+ "790":"A790T568",
793
+ "791":"A791T571",
794
+ "792":"A792T570",
795
+ "793":"A793T569",
796
+ "794":"A794T567",
797
+ "795":"A795T581",
798
+ "796":"A796T605",
799
+ "797":"A797T523",
800
+ "798":"A798T836",
801
+ "799":"A799T837",
802
+ "800":"A800T838",
803
+ "801":"A801T839",
804
+ "802":"A802T840",
805
+ "803":"A803T841",
806
+ "804":"A804T842",
807
+ "805":"A805T843",
808
+ "806":"A806T844",
809
+ "807":"A807T845",
810
+ "808":"A808T846",
811
+ "809":"A809T847",
812
+ "810":"A810T848",
813
+ "811":"A811T849",
814
+ "812":"A812T850",
815
+ "813":"A813T851",
816
+ "814":"A814T852",
817
+ "815":"A815T853",
818
+ "816":"A816T854",
819
+ "817":"A817T855",
820
+ "818":"A818T856",
821
+ "819":"A819T857",
822
+ "820":"A820T858",
823
+ "821":"A821T859",
824
+ "822":"A822T860",
825
+ "823":"A823T861",
826
+ "824":"A824T862",
827
+ "825":"A825T863",
828
+ "826":"A826T864",
829
+ "827":"A827T865",
830
+ "828":"A828T866",
831
+ "829":"A829T867",
832
+ "830":"A830T868",
833
+ "831":"A831T869",
834
+ "832":"A832T870",
835
+ "833":"A833T871",
836
+ "834":"A834T872",
837
+ "835":"A835T873",
838
+ "836":"A836T874",
839
+ "837":"A837T875",
840
+ "838":"A838T876",
841
+ "839":"A839T877",
842
+ "840":"A840T878",
843
+ "841":"A841T879",
844
+ "842":"A842T880",
845
+ "843":"A843T881",
846
+ "844":"A844T882",
847
+ "845":"A845T883",
848
+ "846":"A846T885",
849
+ "847":"A847T886",
850
+ "848":"A848T887",
851
+ "849":"A849T888",
852
+ "850":"A850T889",
853
+ "851":"A851T890",
854
+ "852":"A852T891",
855
+ "853":"A853T892",
856
+ "854":"A854T894",
857
+ "855":"A855T895",
858
+ "856":"A856T896",
859
+ "857":"A857T897",
860
+ "858":"A858T898",
861
+ "859":"A859T899",
862
+ "860":"A860T900",
863
+ "861":"A861T901",
864
+ "862":"A862T902",
865
+ "863":"A863T903",
866
+ "864":"A864T904",
867
+ "865":"A865T905",
868
+ "866":"A866T906",
869
+ "867":"A867T907",
870
+ "868":"A868T908",
871
+ "869":"A869T909",
872
+ "870":"A870T910",
873
+ "871":"A871T911",
874
+ "872":"A872T912",
875
+ "873":"A873T913",
876
+ "874":"A874T914",
877
+ "875":"A875T915",
878
+ "876":"A876T916",
879
+ "877":"A877T917",
880
+ "878":"A878T918",
881
+ "879":"A879T919",
882
+ "880":"A880T920",
883
+ "881":"A881T921",
884
+ "882":"A882T922",
885
+ "883":"A883T923",
886
+ "884":"A884T924",
887
+ "885":"A885T925",
888
+ "886":"A886T927",
889
+ "887":"A887T928",
890
+ "888":"A888T929",
891
+ "889":"A889T930",
892
+ "890":"A890T931",
893
+ "891":"A891T932",
894
+ "892":"A892T933",
895
+ "893":"A893T935",
896
+ "894":"A894T937",
897
+ "895":"A895T938",
898
+ "896":"A896T939",
899
+ "897":"A897T940",
900
+ "898":"A898T941",
901
+ "899":"A899T942",
902
+ "900":"A900T943",
903
+ "901":"A901T944",
904
+ "902":"A902T945",
905
+ "903":"A903T946",
906
+ "904":"A904T947",
907
+ "905":"A905T948",
908
+ "906":"A906T949",
909
+ "907":"A907T950",
910
+ "908":"A908T951",
911
+ "909":"A909T952",
912
+ "910":"A910T953",
913
+ "911":"A911T954",
914
+ "912":"A912T955",
915
+ "913":"A913T956",
916
+ "914":"A914T958",
917
+ "915":"A915T957",
918
+ "916":"A916T959",
919
+ "917":"A917T961",
920
+ "918":"A918T962",
921
+ "919":"A919T963",
922
+ "920":"A920T964",
923
+ "921":"A921T965",
924
+ "922":"A922T966",
925
+ "923":"A923T967",
926
+ "924":"A924T968",
927
+ "925":"A925T969",
928
+ "926":"A926T970",
929
+ "927":"A927T971",
930
+ "928":"A928T972",
931
+ "929":"A929T973",
932
+ "930":"A930T974",
933
+ "931":"A931T975",
934
+ "932":"A932T976",
935
+ "933":"A933T977",
936
+ "934":"A934T978",
937
+ "935":"A935T979",
938
+ "936":"A936T980",
939
+ "937":"A937T981",
940
+ "938":"A938T982",
941
+ "939":"A939T983",
942
+ "940":"A940T984",
943
+ "941":"A941T985",
944
+ "942":"A942T986",
945
+ "943":"A943T987",
946
+ "944":"A944T988",
947
+ "945":"A945T990",
948
+ "946":"A946T991",
949
+ "947":"A947T992",
950
+ "948":"A948T993",
951
+ "949":"A949T994",
952
+ "950":"A950T995",
953
+ "951":"A951T996",
954
+ "952":"A952T997",
955
+ "953":"A953T999",
956
+ "954":"A954T1000",
957
+ "955":"A955T1002",
958
+ "956":"A956T1003",
959
+ "957":"A957T1004",
960
+ "958":"A958T1005",
961
+ "959":"A959T1006",
962
+ "960":"A960T1007",
963
+ "961":"A961T1008",
964
+ "962":"A962T1009",
965
+ "963":"A963T1010",
966
+ "964":"A964T1012",
967
+ "965":"A965T1011",
968
+ "966":"A966T1013",
969
+ "967":"A967T1014",
970
+ "968":"A968T1015",
971
+ "969":"A969T1016",
972
+ "970":"A970T1017",
973
+ "971":"A971T1018",
974
+ "972":"A972T1019",
975
+ "973":"A973T1020",
976
+ "974":"A974T1021",
977
+ "975":"A975T1022",
978
+ "976":"A976T1023",
979
+ "977":"A977T1024",
980
+ "978":"A978T1025",
981
+ "979":"A979T1026",
982
+ "980":"A980T1027",
983
+ "981":"A981T1028",
984
+ "982":"A982T1029",
985
+ "983":"A983T348"
986
+ }
ragatouille-kinya-colbert/indexes/agai-colbert-10000/plan.json ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "config": {
3
+ "query_token_id": "[unused0]",
4
+ "doc_token_id": "[unused1]",
5
+ "query_token": "[Q]",
6
+ "doc_token": "[D]",
7
+ "ncells": null,
8
+ "centroid_score_threshold": null,
9
+ "ndocs": null,
10
+ "load_index_with_mmap": false,
11
+ "index_path": null,
12
+ "index_bsize": 32,
13
+ "nbits": 4,
14
+ "kmeans_niters": 20,
15
+ "resume": false,
16
+ "pool_factor": 1,
17
+ "clustering_mode": "hierarchical",
18
+ "protected_tokens": 0,
19
+ "similarity": "cosine",
20
+ "bsize": 64,
21
+ "accumsteps": 1,
22
+ "lr": 5e-6,
23
+ "maxsteps": 500000,
24
+ "save_every": 5941,
25
+ "warmup": 5941,
26
+ "warmup_bert": null,
27
+ "relu": false,
28
+ "nway": 2,
29
+ "use_ib_negatives": true,
30
+ "reranker": false,
31
+ "distillation_alpha": 1.0,
32
+ "ignore_scores": false,
33
+ "model_name": "AfroColBERT",
34
+ "query_maxlen": 32,
35
+ "attend_to_mask_tokens": false,
36
+ "interaction": "colbert",
37
+ "dim": 1024,
38
+ "doc_maxlen": 512,
39
+ "mask_punctuation": true,
40
+ "checkpoint": "\/mnt\/DATA\/AfroColBERT\/.ragatouille\/colbert\/none\/2025-05\/23\/11.52.12\/checkpoints\/colbert-10000\/",
41
+ "triples": "\/mnt\/DATA\/AfroColBERT\/data\/triples.train.colbert.jsonl",
42
+ "collection": [
43
+ "list with 984 elements starting with...",
44
+ [
45
+ "Inkoko irwaye irangwa n' ibi bikurikira : Iyo indwara yateye mu nkoko , ibigaragaza ubuzima bwazo bwiza birabura cyangwa bigahinduka . Usanga inkoko zigunze , zikonje , zisinzira , ndetse zimwe zahiniye amajosi mu mababa . Amababa n' ibirokoroko byijimye . Akenshi ntizirya cyangwa ugasanga ari imwe imwe itoratora . Hari ubwo usanga zose zihitwa ; hakaba n' ubwo zihitwa amaraso .",
46
+ "Uburyo indwara y' ubushita yandura mu nkoko : Ino ndwara iboneka mu nkoko zose ( into n' inkuru ) . Ni indwara mbi cyane iterwa na virusi . Inkoko zirwaye zanduza izindi . Ibikoresho byanduye nabyo bishobora kwanduza inkoko ino ndwara . Ibimenyetso by' indwara y' ubushita ( Fowl pox ) mu nkoko : Kugagara amaguru , ijosi n' amababa . Utubyimba duto kumaguru ; Igisunzu mumaso ; Kugira umuriro ; Udusebe mu kanwa no mu muhogo ; Gutakaza ibiro ( kunanuka \/ guhorota ) Kubyimba kwaho amababa atereye ; Ubuhumyi ; Guhombana kw' agatorero ; Kutarya no kutanywa ; Igabanuka ry' amagi ndetse n' impfu za hato nahato ; Ni iki wakora ngo ukumire indwara y' ubushita ( Fowl pox ) mu nkoko ; Kubera ko urukingo rw' iyi ndwara rutangwa ku mishwi ikivuka ( umunsi umwe ) ; umworozi asabwa kugura imishwi mu ituragiro ryizewe azi ko zayikingiwe ( akanahabwa icyemezo cyuko zakingiwe ) . Nta bundi buryo buhari bwo kuyirinda . Kwirinda no kuvura indwara y' Ubushita ( Fowl pox ) mu nkoko : Iyi ndwara nta muti igira . Kuyirinda ukurikiza amabwiriza yose y' isuku nicyo gisubizo cyonyine . Irinde kujya mu biraro by' inkoko bifite iyi ndwara . Mu gihe uguze izindi nkoko ugomba kuzishyira mu kato byibuze mu gihe cy' amezi 2 kugira ngo urebe niba nta burwayi zifite . Irinde imibu n' amasazi kuko bishobora kwanduza inkoko iyi ndwara .",
47
+ "Gukingira inkoko z' inyama ; Igihe inkoko zimaze iminsi 2 : Inkoko zikorerwa urukingo rwa Umuraramo ( New Castle ) . Igipimo cy' umuti ni New Castle HB 1 : agacupa k' inkingo 1000 bashyira muri litiro 20 z' amazi meza . Igihe inkoko zimaze iminsi 1 kugeza kuri 4 : Inkoko zihabwa Vitamine ( Anti - stress ) . Igipimo cy' umuti ni AMINOVIT : garama 1 bayivanga na litiro 1 y' amazi meza . Igihe inkoko zimaze iminsi 5 - 7 : Inkoko zikorerwa gukumira Kogusidiyoze . Igipimo cy' umuti ni VETACOX : garama 1 bayivanga na litiro 2 z' amazi meza . Igihe inkoko zimaze iminsi 7 : Inkoko zikorerwa urukingo rwa GUMBORO . Igipimo cy' umuti ni Cevac Gumbo L : agacupa k' inkingo 1000 bashyira muri litiro 20 z' amazi meza . Igihe inkoko zimaze iminsi 14 : Inkoko zikorerwa urukingo rwa Umuraramo ( New Castle ) . Igipimo cy' umuti ni Newcastle La sota : agacupa ka doze 1000 bashyira muri litiro 20 z' amazi meza . 6 . Igihe inkoko zimaze iminsi 17 - 19 : Inkoko zikorerwa gukumira Kogusidiyoze . Igipimo cy' umuti ni Amprolium : garama 1 bayivanga na litiro 1 y' amazi meza . 7 . Igihe inkoko zimaze iminsi 21 : Inkoko zikorerwa urukingo rwa GUMBORO . Igipimo cy' umuti ni Cevac Gumbo L : agacupa ka doze 1000 bashyira muri litiro 20 z' amazi meza ; Icyitonderwa : Igihe indwara igaragaye mu nkoko , umworozi agomba guhita ahamagara umuganga w' amatungo uri hafi wemewe . Ubutabazi bw' ibanze : Mu gihe ugitegereje umuganga w' amatungo nibyiza kuziha amavitamine , imyunyungugu , amazi meza n' indyo yuzuye no kuzitaho ."
48
+ ]
49
+ ],
50
+ "queries": "\/mnt\/DATA\/AfroColBERT\/data\/queries.train.colbert.tsv",
51
+ "index_name": "agai-colbert-10000",
52
+ "overwrite": false,
53
+ "root": ".ragatouille\/",
54
+ "experiment": "colbert",
55
+ "index_root": null,
56
+ "name": "2025-05\/23\/18.01.47",
57
+ "rank": 0,
58
+ "nranks": 2,
59
+ "amp": true,
60
+ "gpus": 2,
61
+ "avoid_fork_if_possible": false
62
+ },
63
+ "num_chunks": 2,
64
+ "num_partitions": 4096,
65
+ "num_embeddings_est": 192790.78857421875,
66
+ "avg_doclen_est": 195.92559814453125
67
+ }