thekusaldarshana commited on
Commit
3ca29ed
·
verified ·
1 Parent(s): 89fc0dc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -0
README.md CHANGED
@@ -1,3 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # Syllable is the Token: SGPE - Syllable-Aware Grapheme Pair Encoding
2
 
3
  **Remeinium Research**
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - polyglots/MADLAD_CulturaX_cleaned
5
+ language:
6
+ - si
7
+ pipeline_tag: feature-extraction
8
+ library_name: transformers
9
+ tags:
10
+ - tokenizer
11
+ - SGPE
12
+ - linguis_trie
13
+ - token
14
+ - tokenization
15
+ - remeinium
16
+ - transformer
17
+ - linguistics
18
+ - NLP
19
+ - sinhala
20
+ - BPE
21
+ - GPE
22
+ model-index:
23
+ - name: SGPE-Sinhala
24
+ results:
25
+ - task:
26
+ type: feature-extraction
27
+ dataset:
28
+ name: MADLAD-400 (CulturaX Cleaned Sinhala subset)
29
+ type: polyglots/MADLAD_CulturaX_cleaned
30
+ metrics:
31
+ - name: Token-to-Word Ratio (TWR)
32
+ type: twr
33
+ value: 1.438
34
+ verified: false
35
+ - name: Characters per Token (CPT)
36
+ type: cpt
37
+ value: 4.48
38
+ verified: false
39
+ ---
40
  # Syllable is the Token: SGPE - Syllable-Aware Grapheme Pair Encoding
41
 
42
  **Remeinium Research**