kgrabko commited on
Commit
e8bddc9
Β·
verified Β·
1 Parent(s): 242b971

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -22
README.md CHANGED
@@ -1,42 +1,81 @@
1
  ---
2
  license: apache-2.0
3
  ---
4
- I have added my empty models using GPT-3 Standard as well as Llama 3 and Mistral architectures.
5
- My smaller GPT-2 models utilize LayerNorm and FFN layers, whereas for larger models,
6
- I have replaced these components with RMSNorm and SwiGLU. This adjustment allows a smoother transition to large model architectures,
7
- including models with 8B, 33B, 70B, and 120B parameters.
8
 
9
- So please GPT-2 huggingface tokenizer for english and for multi languages bert tokenizer from huggingface library .
10
 
11
- Transformer block is not frozen that give more power to tune model from scratch
12
 
13
- My GPT-2 Archtecure similar classic GPT-2 transfomer
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  CustomEmbedding
15
  FrozenSignatureLayer
16
  LearnedPositionalEmbedding
17
- TransformerBlock -> MultiHeadAttention
18
- TransformerBlock -> LayerNorm
19
- TransformerBlock -> LayerNorm
20
- TransformerBlock -> ffn --> Linear
21
- TransformerBlock -> ffn --> Activation::gelu
22
- TransformerBlock -> ffn --> Linear
 
 
23
  LayerNorm
24
  Linear
 
 
 
25
 
 
26
 
27
- My GPT-3 Archtecure similar as LLAMA 3 and Mistral
28
  CustomEmbedding
29
- # Positional Embedding removed, RoPE intergared in Attention
30
- TransformerBlock -> MultiHeadAttention
31
- TransformerBlock -> SwiGLUFeedForward - > Linear # Gate Layer
32
- TransformerBlock -> SwiGLUFeedForward - > Linear # Up Layer
33
- TransformerBlock -> SwiGLUFeedForward - > Linear # Projection (Down) Layer
34
- TransformerBlock -> RMSNorm
 
 
35
  RMSNorm
36
  Linear
37
  FrozenSignatureLayer
 
 
 
38
 
39
- CMS Manhattan Copyright: Copyright (c) 2002-2026
 
 
 
 
40
 
 
41
 
42
- It was replaced layers with
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ # Model Architecture Overview
 
 
 
5
 
6
+ ## Architectures Included
7
 
8
+ I have added my empty models based on the following architectures:
9
 
10
+ - **GPT-3 Standard**
11
+ - **Llama 3**
12
+ - **Mistral**
13
+
14
+ For smaller models modeled after **GPT-2**, I utilize `LayerNorm` and `FFN` layers. For larger models, these layers are replaced with `RMSNorm` and `SwiGLU`, enabling a smoother transition to architectures with larger parameter sizes (8B, 33B, 70B, and 120B).
15
+
16
+ ---
17
+
18
+ ## Tokenizer Choices
19
+
20
+ - For English models: **GPT-2 Hugging Face tokenizer**
21
+ - For multilingual models: **BERT tokenizer** from the Hugging Face library
22
+
23
+ ---
24
+
25
+ ## Training and Tuning
26
+
27
+ The **Transformer block is not frozen**, providing greater flexibility and power when tuning models from scratch.
28
+
29
+ ---
30
+
31
+ ## Model Architecture Details
32
+
33
+ ### GPT-2 Architecture (Classic, Transformer-like)
34
+
35
+ ```
36
  CustomEmbedding
37
  FrozenSignatureLayer
38
  LearnedPositionalEmbedding
39
+ [TransformerBlock]
40
+ β”œβ”€β”€ MultiHeadAttention
41
+ β”œβ”€β”€ LayerNorm
42
+ β”œβ”€β”€ LayerNorm
43
+ β”œβ”€β”€ FFN
44
+ β”œβ”€β”€ Linear
45
+ β”œβ”€β”€ Activation: GELU
46
+ └── Linear
47
  LayerNorm
48
  Linear
49
+ ```
50
+
51
+ ---
52
 
53
+ ### GPT-3 Architecture (Similar to Llama 3 & Mistral)
54
 
55
+ ```
56
  CustomEmbedding
57
+ # Positional Embedding removed, RoPE integrated in Attention
58
+ [TransformerBlock]
59
+ β”œβ”€β”€ MultiHeadAttention
60
+ β”œβ”€β”€ SwiGLUFeedForward
61
+ β”œβ”€β”€ Linear (Gate Layer)
62
+ β”œβ”€β”€ Linear (Up Layer)
63
+ └── Linear (Projection/Down Layer)
64
+ └── RMSNorm
65
  RMSNorm
66
  Linear
67
  FrozenSignatureLayer
68
+ ```
69
+
70
+ ---
71
 
72
+ **Note:** The large model architectures replace specific layers:
73
+ - `LayerNorm` β†’ `RMSNorm`
74
+ - `FFN` β†’ `SwiGLU`
75
+
76
+ ---
77
 
78
+ ## License
79
 
80
+ CMS Manhattan
81
+ Copyright Β© 2002–2026