Vanilla Transformer + Morpheme Tokenizer β€” EViRAL

Task: Ede query β†’ Vietnamese passage retrieval Config: 6 layers / hidden 512 / 8 heads / FFN 2048 Tokenizer: corpus-driven morpheme segmentation + Ede-only synonym buffer (Vi as pivot)

Checkpoints

file description
mlm.pt MLM pre-trained encoder
align.pt cross-lingual aligned encoder
finetune.pt contrastive fine-tuned encoder (best val)
vocab.json morpheme vocab (token β†’ id)

Vocab size

57023

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including NIRVLab/vanilla_morpheme