| --- |
| language: |
| - vi |
| - ede |
| tags: |
| - cross-lingual-retrieval |
| - sentencepiece-tokenizer |
| - colbert |
| - EViRAL |
| --- |
| |
| # ColBERT + SentencePiece — EViRAL |
|
|
| Task: Ede query → Vietnamese passage retrieval |
|
|
| ## Eval Results |
|
|
| | Metric | Validation | Test | |
| |---------|-----------|--------| |
| | nDCG@1 | 0.0004 | 0.0004 | |
| | nDCG@5 | 0.0009 | 0.0011 | |
| | nDCG@10 | 0.0018 | 0.0019 | |
| | MRR@10 | 0.0019 | 0.0020 | |
| | R@50 | 0.0204 | 0.0206 | |
| | R@100 | 0.0370 | 0.0389 | |
|
|
| ## Checkpoints |
| | file | description | |
| |---|---| |
| | mlm.pt | MLM pre-trained encoder | |
| | align.pt | cross-lingual aligned encoder | |
| | finetune.pt | contrastive fine-tuned encoder (best val) | |
| | sp_tokenizer/spm.model | SentencePiece model | |
| | sp_tokenizer/spm.vocab | SentencePiece vocab | |
|
|