Update README.md
Browse files
README.md
CHANGED
|
@@ -68,11 +68,13 @@ NeoLLM is a hybrid architecture language model that combines multiple state-of-t
|
|
| 68 |
NeoLLM incorporates several cutting-edge components:
|
| 69 |
|
| 70 |
- **FANformer Integration**: Fourier Analysis Network (FAN) layers for effective periodicity modeling with fan_ratio of 0.125
|
| 71 |
-
- **Hybrid Attention Architecture**:
|
| 72 |
- **Polynomial Composition Activations**: PolyNorm activation functions in MLP layers for enhanced dynamics
|
| 73 |
- **Advanced Normalization**: LayerNorm Scaling (LNS) and Gradient-Preserving Activation Scaling (GPAS)
|
| 74 |
- **Efficient Linear Attention**: Gated Delta Networks for improved computational efficiency
|
| 75 |
|
|
|
|
|
|
|
| 76 |
### Architecture Details
|
| 77 |
|
| 78 |
- **Model Size**: 110M parameters (77M embeddings + 33M non-embeddings)
|
|
|
|
| 68 |
NeoLLM incorporates several cutting-edge components:
|
| 69 |
|
| 70 |
- **FANformer Integration**: Fourier Analysis Network (FAN) layers for effective periodicity modeling with fan_ratio of 0.125
|
| 71 |
+
- **Hybrid Attention Architecture**: Follows Qwen3-Next's approach with 1 full attention layer per 3 linear attention layers
|
| 72 |
- **Polynomial Composition Activations**: PolyNorm activation functions in MLP layers for enhanced dynamics
|
| 73 |
- **Advanced Normalization**: LayerNorm Scaling (LNS) and Gradient-Preserving Activation Scaling (GPAS)
|
| 74 |
- **Efficient Linear Attention**: Gated Delta Networks for improved computational efficiency
|
| 75 |
|
| 76 |
+
|
| 77 |
+
|
| 78 |
### Architecture Details
|
| 79 |
|
| 80 |
- **Model Size**: 110M parameters (77M embeddings + 33M non-embeddings)
|