Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,40 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
---
|
| 4 |
+
I have added my empty models using GPT-3 Standard as well as Llama 3 and Mistral architectures.
|
| 5 |
+
My smaller GPT-2 models utilize LayerNorm and FFN layers, whereas for larger models,
|
| 6 |
+
I have replaced these components with RMSNorm and SwiGLU. This adjustment allows a smoother transition to large model architectures,
|
| 7 |
+
including models with 8B, 33B, 70B, and 120B parameters.
|
| 8 |
+
|
| 9 |
+
Transformer block is not frozen that give more power to tune model from scratch
|
| 10 |
+
|
| 11 |
+
My GPT-2 Archtecure similar classic GPT-2 transfomer
|
| 12 |
+
CustomEmbedding
|
| 13 |
+
FrozenSignatureLayer
|
| 14 |
+
LearnedPositionalEmbedding
|
| 15 |
+
TransformerBlock -> MultiHeadAttention
|
| 16 |
+
TransformerBlock -> LayerNorm
|
| 17 |
+
TransformerBlock -> LayerNorm
|
| 18 |
+
TransformerBlock -> ffn --> Linear
|
| 19 |
+
TransformerBlock -> ffn --> Activation::gelu
|
| 20 |
+
TransformerBlock -> ffn --> Linear
|
| 21 |
+
LayerNorm
|
| 22 |
+
Linear
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
My GPT-3 Archtecure similar as LLAMA 3 and Mistral
|
| 26 |
+
CustomEmbedding
|
| 27 |
+
# Positional Embedding removed, RoPE intergared in Attention
|
| 28 |
+
TransformerBlock -> MultiHeadAttention
|
| 29 |
+
TransformerBlock -> SwiGLUFeedForward - > Linear # Gate Layer
|
| 30 |
+
TransformerBlock -> SwiGLUFeedForward - > Linear # Up Layer
|
| 31 |
+
TransformerBlock -> SwiGLUFeedForward - > Linear # Projection (Down) Layer
|
| 32 |
+
TransformerBlock -> RMSNorm
|
| 33 |
+
RMSNorm
|
| 34 |
+
Linear
|
| 35 |
+
FrozenSignatureLayer
|
| 36 |
+
|
| 37 |
+
CMS Manhattan Copyright: Copyright (c) 2002-2026
|
| 38 |
+
|
| 39 |
+
|
| 40 |
+
It was replaced layers with
|