Update README.md
Browse files
README.md
CHANGED
|
@@ -56,12 +56,15 @@ It avoids the quadratic cost of full self-attention by summarizing per-speaker m
|
|
| 56 |
|
| 57 |
## 📈 Performance (on SODA, Masked Language Modeling)
|
| 58 |
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
|
| 63 |
-
| +
|
| 64 |
-
|
|
|
|
|
|
|
|
|
|
|
| 65 |
|
| 66 |
> SAUTE achieves the best accuracy using fewer parameters than multi-layer transformers.
|
| 67 |
|
|
|
|
| 56 |
|
| 57 |
## 📈 Performance (on SODA, Masked Language Modeling)
|
| 58 |
|
| 59 |
+
|
| 60 |
+
| Model | Avg MLM Acc | Best MLM Acc |
|
| 61 |
+
|---------------------------|-------------|--------------|
|
| 62 |
+
| BERT-base (frozen) | 33.45 | 45.89 |
|
| 63 |
+
| + 1-layer Transformer | 68.20 | 76.69 |
|
| 64 |
+
| + 2-layer Transformer | 71.81 | 79.54 |
|
| 65 |
+
| **+ 1-layer SAUTE (Ours)** | **72.05** | **80.40%** |
|
| 66 |
+
| + 3-layer Transformer| 73.5 | 80.84 |
|
| 67 |
+
| **+ 3-layer SAUTE (Ours)**| **75.65** | **85.55%**|
|
| 68 |
|
| 69 |
> SAUTE achieves the best accuracy using fewer parameters than multi-layer transformers.
|
| 70 |
|