Update README.md
Browse files
README.md
CHANGED
|
@@ -35,8 +35,8 @@ HRWKV7-Reka-Flash3-Preview is an experimental hybrid architecture model that com
|
|
| 35 |
|
| 36 |
The model implements several key improvements over standard RWKV architectures:
|
| 37 |
|
| 38 |
-
1. **Token Shift Removal**:
|
| 39 |
-
2. **GroupNorm Removal**:
|
| 40 |
3. **k_first Introduction**: Experimentally adopted the approach of residually connecting k layers in layer 0.
|
| 41 |
|
| 42 |
### Hybrid Design Benefits
|
|
|
|
| 35 |
|
| 36 |
The model implements several key improvements over standard RWKV architectures:
|
| 37 |
|
| 38 |
+
1. **Token Shift Removal**: In order to effectively inherit the teacher model weights, we removed the residual connection one token ago.
|
| 39 |
+
2. **GroupNorm Removal**: Helps improve training stability issues
|
| 40 |
3. **k_first Introduction**: Experimentally adopted the approach of residually connecting k layers in layer 0.
|
| 41 |
|
| 42 |
### Hybrid Design Benefits
|