vanilla_bpe / README.md
HeyDunaX's picture
add model card
db9d66b verified
metadata
language:
  - vi
  - ede
tags:
  - cross-lingual-retrieval
  - bpe-tokenizer
  - vanilla-transformer
  - EViRAL

Vanilla Transformer + BPE — EViRAL

Task: Ede query → Vietnamese passage retrieval Config: 6 layers / hidden 512 / 8 heads / FFN 2048 Tokenizer: BPE (vocab 32 000, trained from scratch on Ede + Vi corpus)

Checkpoints

file description
mlm.pt MLM pre-trained encoder
align.pt cross-lingual aligned encoder
finetune.pt contrastive fine-tuned encoder (best val)
bpe_tokenizer/tokenizer.json BPE tokenizer