Update README.md
Browse files
README.md
CHANGED
|
@@ -36,8 +36,8 @@ HRWKV7-Reka-Flash3-Preview is an experimental hybrid architecture model that com
|
|
| 36 |
The model implements several key improvements over standard RWKV architectures:
|
| 37 |
|
| 38 |
1. **Token Shift Removal**: Unlike traditional RWKV, the hxa079 variant removes token shifting mechanisms
|
| 39 |
-
2. **GroupNorm Removal**: Eliminates GroupNorm
|
| 40 |
-
3. **k_first Introduction**:
|
| 41 |
|
| 42 |
### Hybrid Design Benefits
|
| 43 |
|
|
|
|
| 36 |
The model implements several key improvements over standard RWKV architectures:
|
| 37 |
|
| 38 |
1. **Token Shift Removal**: Unlike traditional RWKV, the hxa079 variant removes token shifting mechanisms
|
| 39 |
+
2. **GroupNorm Removal**: Eliminates GroupNorm for training stability
|
| 40 |
+
3. **k_first Introduction**: Experimentally adopted the approach of residually connecting k layers in layer 0.
|
| 41 |
|
| 42 |
### Hybrid Design Benefits
|
| 43 |
|