| --- |
| language: |
| - vi |
| - ede |
| tags: |
| - cross-lingual-retrieval |
| - morpheme-tokenizer |
| - vanilla-transformer |
| - EViRAL |
| --- |
| |
| # Vanilla Transformer + Morpheme Tokenizer — EViRAL |
|
|
| Task: Ede query → Vietnamese passage retrieval |
| Config: 6 layers / hidden 512 / 8 heads / FFN 2048 |
| Tokenizer: corpus-driven morpheme segmentation + Ede-only synonym buffer (Vi as pivot) |
|
|
| ## Checkpoints |
| | file | description | |
| |---|---| |
| | mlm.pt | MLM pre-trained encoder | |
| | align.pt | cross-lingual aligned encoder | |
| | finetune.pt | contrastive fine-tuned encoder (best val) | |
| | vocab.json | morpheme vocab (token → id) | |
|
|
| ## Vocab size |
| 57023 |
|
|