varuni commited on
Commit
77eb1a4
·
verified ·
1 Parent(s): b1d22c7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -16
README.md CHANGED
@@ -88,22 +88,11 @@ Higher = better (larger text compressed into fewer bytes)
88
  ## **Fertility Score (FS)**
89
  Lower = better (#tokens produced per grapheme/character)
90
 
91
- ### **Results Across Vocabulary Sizes**
92
-
93
- | Tokenizer | Vocab | CR | FS |
94
- |-----------|-------|-------|-------|
95
- | **GAT (ours)** | 8k | **3.588** | 2.168 |
96
- | SentencePiece | 8k | 3.100 | 2.445 |
97
- | BPE | 8k | 3.300 | 2.311 |
98
- | WordPiece | 8k | 2.343 | 2.486 |
99
- | **GAT (ours)** | 16k | **3.930** | 1.886 |
100
- | SentencePiece | 16k | 3.780 | 1.917 |
101
- | BPE | 16k | 3.540 | 2.640 |
102
- | WordPiece | 16k | 3.243 | 2.976 |
103
- | **GAT (ours)** | 32k | **4.806** | 1.627 |
104
- | SentencePiece | 32k | 3.855 | 1.775 |
105
- | BPE | 32k | 3.512 | 1.869 |
106
- | WordPiece | 32k | 3.143 | 1.908 |
107
 
108
  ---
109
 
 
88
  ## **Fertility Score (FS)**
89
  Lower = better (#tokens produced per grapheme/character)
90
 
91
+ ### **Results for CR and FS**
92
+
93
+ GAT consistently showed better compression ratio and fertility score across vocab sizes .
94
+ CR : 3.5 -> 3.9 -> 4.8
95
+ FS : 2.1 -> 1.8 -> 1.6
 
 
 
 
 
 
 
 
 
 
 
96
 
97
  ---
98