Update README.md
Browse files
README.md
CHANGED
|
@@ -45,7 +45,7 @@ The model implements several key improvements over standard RWKV architectures:
|
|
| 45 |
|
| 46 |
### Hybrid Design Benefits
|
| 47 |
|
| 48 |
-
- **Linear Attention Inference**: RWKV blocks enable O(1) memory complexity during inference, and
|
| 49 |
- **Enhanced Needle Tasks**: Strategic placement of GQA layers significantly improves performance on needle-in-haystack retrieval tasks, addressing a known limitation of pure linear attention models
|
| 50 |
- **Implicit Position Encoding**: Interestingly, the model achieves better performance when RoPE (Rotary Position Embedding) is not applied to GQA layers, suggesting that RWKV blocks provide implicit positional encoding capabilities
|
| 51 |
|
|
|
|
| 45 |
|
| 46 |
### Hybrid Design Benefits
|
| 47 |
|
| 48 |
+
- **Linear Attention Inference**: RWKV blocks enable O(1) memory complexity during inference, and the hybrid approach reduces the KVCache to 1/7 of full GQA.
|
| 49 |
- **Enhanced Needle Tasks**: Strategic placement of GQA layers significantly improves performance on needle-in-haystack retrieval tasks, addressing a known limitation of pure linear attention models
|
| 50 |
- **Implicit Position Encoding**: Interestingly, the model achieves better performance when RoPE (Rotary Position Embedding) is not applied to GQA layers, suggesting that RWKV blocks provide implicit positional encoding capabilities
|
| 51 |
|