colbert_unigram / README.md
HeyDunaX's picture
add model card
d551d1f verified
metadata
language:
  - vi
  - ede
tags:
  - cross-lingual-retrieval
  - sentencepiece-tokenizer
  - colbert
  - EViRAL

ColBERT + SentencePiece — EViRAL

Task: Ede query → Vietnamese passage retrieval

Eval Results

Metric Validation Test
nDCG@1 0.0004 0.0004
nDCG@5 0.0009 0.0011
nDCG@10 0.0018 0.0019
MRR@10 0.0019 0.0020
R@50 0.0204 0.0206
R@100 0.0370 0.0389

Checkpoints

file description
mlm.pt MLM pre-trained encoder
align.pt cross-lingual aligned encoder
finetune.pt contrastive fine-tuned encoder (best val)
sp_tokenizer/spm.model SentencePiece model
sp_tokenizer/spm.vocab SentencePiece vocab