vanilla_morpheme / README.md
HeyDunaX's picture
add model card
b1ab6d0 verified
metadata
language:
  - vi
  - ede
tags:
  - cross-lingual-retrieval
  - morpheme-tokenizer
  - vanilla-transformer
  - EViRAL

Vanilla Transformer + Morpheme Tokenizer — EViRAL

Task: Ede query → Vietnamese passage retrieval Config: 6 layers / hidden 512 / 8 heads / FFN 2048 Tokenizer: corpus-driven morpheme segmentation + Ede-only synonym buffer (Vi as pivot)

Checkpoints

file description
mlm.pt MLM pre-trained encoder
align.pt cross-lingual aligned encoder
finetune.pt contrastive fine-tuned encoder (best val)
vocab.json morpheme vocab (token → id)

Vocab size

57023