File size: 615 Bytes
4c6d445 b1ab6d0 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | ---
language:
- vi
- ede
tags:
- cross-lingual-retrieval
- morpheme-tokenizer
- vanilla-transformer
- EViRAL
---
# Vanilla Transformer + Morpheme Tokenizer — EViRAL
Task: Ede query → Vietnamese passage retrieval
Config: 6 layers / hidden 512 / 8 heads / FFN 2048
Tokenizer: corpus-driven morpheme segmentation + Ede-only synonym buffer (Vi as pivot)
## Checkpoints
| file | description |
|---|---|
| mlm.pt | MLM pre-trained encoder |
| align.pt | cross-lingual aligned encoder |
| finetune.pt | contrastive fine-tuned encoder (best val) |
| vocab.json | morpheme vocab (token → id) |
## Vocab size
57023
|