Update README.md
Browse files
README.md
CHANGED
|
@@ -88,22 +88,11 @@ Higher = better (larger text compressed into fewer bytes)
|
|
| 88 |
## **Fertility Score (FS)**
|
| 89 |
Lower = better (#tokens produced per grapheme/character)
|
| 90 |
|
| 91 |
-
### **Results
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
| SentencePiece | 8k | 3.100 | 2.445 |
|
| 97 |
-
| BPE | 8k | 3.300 | 2.311 |
|
| 98 |
-
| WordPiece | 8k | 2.343 | 2.486 |
|
| 99 |
-
| **GAT (ours)** | 16k | **3.930** | 1.886 |
|
| 100 |
-
| SentencePiece | 16k | 3.780 | 1.917 |
|
| 101 |
-
| BPE | 16k | 3.540 | 2.640 |
|
| 102 |
-
| WordPiece | 16k | 3.243 | 2.976 |
|
| 103 |
-
| **GAT (ours)** | 32k | **4.806** | 1.627 |
|
| 104 |
-
| SentencePiece | 32k | 3.855 | 1.775 |
|
| 105 |
-
| BPE | 32k | 3.512 | 1.869 |
|
| 106 |
-
| WordPiece | 32k | 3.143 | 1.908 |
|
| 107 |
|
| 108 |
---
|
| 109 |
|
|
|
|
| 88 |
## **Fertility Score (FS)**
|
| 89 |
Lower = better (#tokens produced per grapheme/character)
|
| 90 |
|
| 91 |
+
### **Results for CR and FS**
|
| 92 |
+
|
| 93 |
+
GAT consistently showed better compression ratio and fertility score across vocab sizes .
|
| 94 |
+
CR : 3.5 -> 3.9 -> 4.8
|
| 95 |
+
FS : 2.1 -> 1.8 -> 1.6
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 96 |
|
| 97 |
---
|
| 98 |
|