KitsuVp commited on
Commit
c8bde25
·
verified ·
1 Parent(s): e9c4723

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -68,11 +68,13 @@ NeoLLM is a hybrid architecture language model that combines multiple state-of-t
68
  NeoLLM incorporates several cutting-edge components:
69
 
70
  - **FANformer Integration**: Fourier Analysis Network (FAN) layers for effective periodicity modeling with fan_ratio of 0.125
71
- - **Hybrid Attention Architecture**: Alternates between full attention and linear attention (Gated Delta Net) layers inspired by Qwen3-Next
72
  - **Polynomial Composition Activations**: PolyNorm activation functions in MLP layers for enhanced dynamics
73
  - **Advanced Normalization**: LayerNorm Scaling (LNS) and Gradient-Preserving Activation Scaling (GPAS)
74
  - **Efficient Linear Attention**: Gated Delta Networks for improved computational efficiency
75
 
 
 
76
  ### Architecture Details
77
 
78
  - **Model Size**: 110M parameters (77M embeddings + 33M non-embeddings)
 
68
  NeoLLM incorporates several cutting-edge components:
69
 
70
  - **FANformer Integration**: Fourier Analysis Network (FAN) layers for effective periodicity modeling with fan_ratio of 0.125
71
+ - **Hybrid Attention Architecture**: Follows Qwen3-Next's approach with 1 full attention layer per 3 linear attention layers
72
  - **Polynomial Composition Activations**: PolyNorm activation functions in MLP layers for enhanced dynamics
73
  - **Advanced Normalization**: LayerNorm Scaling (LNS) and Gradient-Preserving Activation Scaling (GPAS)
74
  - **Efficient Linear Attention**: Gated Delta Networks for improved computational efficiency
75
 
76
+
77
+
78
  ### Architecture Details
79
 
80
  - **Model Size**: 110M parameters (77M embeddings + 33M non-embeddings)