Update README.md
Browse files
README.md
CHANGED
|
@@ -50,7 +50,7 @@ It avoids the quadratic cost of full self-attention by summarizing per-speaker m
|
|
| 50 |
- 🧠 **Speaker-Aware Memory**: Structured per-speaker representation of dialogue context.
|
| 51 |
- ⚡ **Linear Attention**: Efficient and scalable to long dialogues.
|
| 52 |
- 🧩 **Pretrained Transformer Compatible**: Can plug into frozen or fine-tuned BERT models.
|
| 53 |
-
- 🪶 **Lightweight**:
|
| 54 |
|
| 55 |
---
|
| 56 |
|
|
|
|
| 50 |
- 🧠 **Speaker-Aware Memory**: Structured per-speaker representation of dialogue context.
|
| 51 |
- ⚡ **Linear Attention**: Efficient and scalable to long dialogues.
|
| 52 |
- 🧩 **Pretrained Transformer Compatible**: Can plug into frozen or fine-tuned BERT models.
|
| 53 |
+
- 🪶 **Lightweight**: ~4M parameters less than 2-layer with strong MLM performance improvements.
|
| 54 |
|
| 55 |
---
|
| 56 |
|