kgrabko commited on
Commit
343aa9f
·
verified ·
1 Parent(s): af4b969

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -3
README.md CHANGED
@@ -1,3 +1,40 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ I have added my empty models using GPT-3 Standard as well as Llama 3 and Mistral architectures.
5
+ My smaller GPT-2 models utilize LayerNorm and FFN layers, whereas for larger models,
6
+ I have replaced these components with RMSNorm and SwiGLU. This adjustment allows a smoother transition to large model architectures,
7
+ including models with 8B, 33B, 70B, and 120B parameters.
8
+
9
+ Transformer block is not frozen that give more power to tune model from scratch
10
+
11
+ My GPT-2 Archtecure similar classic GPT-2 transfomer
12
+ CustomEmbedding
13
+ FrozenSignatureLayer
14
+ LearnedPositionalEmbedding
15
+ TransformerBlock -> MultiHeadAttention
16
+ TransformerBlock -> LayerNorm
17
+ TransformerBlock -> LayerNorm
18
+ TransformerBlock -> ffn --> Linear
19
+ TransformerBlock -> ffn --> Activation::gelu
20
+ TransformerBlock -> ffn --> Linear
21
+ LayerNorm
22
+ Linear
23
+
24
+
25
+ My GPT-3 Archtecure similar as LLAMA 3 and Mistral
26
+ CustomEmbedding
27
+ # Positional Embedding removed, RoPE intergared in Attention
28
+ TransformerBlock -> MultiHeadAttention
29
+ TransformerBlock -> SwiGLUFeedForward - > Linear # Gate Layer
30
+ TransformerBlock -> SwiGLUFeedForward - > Linear # Up Layer
31
+ TransformerBlock -> SwiGLUFeedForward - > Linear # Projection (Down) Layer
32
+ TransformerBlock -> RMSNorm
33
+ RMSNorm
34
+ Linear
35
+ FrozenSignatureLayer
36
+
37
+ CMS Manhattan Copyright: Copyright (c) 2002-2026
38
+
39
+
40
+ It was replaced layers with