--- language: - vi - ede tags: - cross-lingual-retrieval - morpheme-tokenizer - vanilla-transformer - EViRAL --- # Vanilla Transformer + Morpheme Tokenizer — EViRAL Task: Ede query → Vietnamese passage retrieval Config: 6 layers / hidden 512 / 8 heads / FFN 2048 Tokenizer: corpus-driven morpheme segmentation + Ede-only synonym buffer (Vi as pivot) ## Checkpoints | file | description | |---|---| | mlm.pt | MLM pre-trained encoder | | align.pt | cross-lingual aligned encoder | | finetune.pt | contrastive fine-tuned encoder (best val) | | vocab.json | morpheme vocab (token → id) | ## Vocab size 57023