HeyDunaX commited on
Commit
4c6d445
·
verified ·
1 Parent(s): 6f9800e

add model card

Browse files
Files changed (1) hide show
  1. README.md +27 -0
README.md ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - vi
4
+ - ede
5
+ tags:
6
+ - cross-lingual-retrieval
7
+ - morpheme-tokenizer
8
+ - vanilla-transformer
9
+ - EViRAL
10
+ ---
11
+
12
+ # Vanilla Transformer + Morpheme Tokenizer — EViRAL
13
+
14
+ Task: Ede query → Vietnamese passage retrieval
15
+ Config: 6 layers / hidden 512 / 8 heads / FFN 2048
16
+ Tokenizer: corpus-driven morpheme segmentation + Ede-only synonym buffer (Vi as pivot)
17
+
18
+ ## Checkpoints
19
+ | file | description |
20
+ |---|---|
21
+ | mlm.pt | MLM pre-trained encoder |
22
+ | align.pt | cross-lingual aligned encoder |
23
+ | finetune.pt | contrastive fine-tuned encoder (best val) |
24
+ | vocab.json | morpheme vocab (token → id) |
25
+
26
+ ## Vocab size
27
+ 62198