HeyDunaX commited on
Commit
db9d66b
·
verified ·
1 Parent(s): 5caeba5

add model card

Browse files
Files changed (1) hide show
  1. README.md +7 -9
README.md CHANGED
@@ -4,23 +4,21 @@ language:
4
  - ede
5
  tags:
6
  - cross-lingual-retrieval
7
- - morpheme-tokenizer
8
  - vanilla-transformer
9
  - EViRAL
10
  ---
11
 
12
- # Vanilla Transformer + Morpheme Tokenizer — EViRAL
13
 
14
- Task: Ede query → Vietnamese passage retrieval
15
- Config: 6 layers / hidden 512 / 8 heads / FFN 2048
16
- Tokenizer: corpus-driven morpheme segmentation + Ede-only synonym buffer
17
 
18
  ## Checkpoints
19
  | file | description |
20
- |------|-------------|
21
  | mlm.pt | MLM pre-trained encoder |
22
  | align.pt | cross-lingual aligned encoder |
23
  | finetune.pt | contrastive fine-tuned encoder (best val) |
24
-
25
- ## Vocab size
26
- `32000`
 
4
  - ede
5
  tags:
6
  - cross-lingual-retrieval
7
+ - bpe-tokenizer
8
  - vanilla-transformer
9
  - EViRAL
10
  ---
11
 
12
+ # Vanilla Transformer + BPE — EViRAL
13
 
14
+ Task: Ede query → Vietnamese passage retrieval
15
+ Config: 6 layers / hidden 512 / 8 heads / FFN 2048
16
+ Tokenizer: BPE (vocab 32 000, trained from scratch on Ede + Vi corpus)
17
 
18
  ## Checkpoints
19
  | file | description |
20
+ |---|---|
21
  | mlm.pt | MLM pre-trained encoder |
22
  | align.pt | cross-lingual aligned encoder |
23
  | finetune.pt | contrastive fine-tuned encoder (best val) |
24
+ | bpe_tokenizer/tokenizer.json | BPE tokenizer |