CMSManhattan
/

JiRack_GPT3_empty

Model card Files Files and versions

kgrabko commited on Nov 29, 2025

Commit

343aa9f

·

verified ·

1 Parent(s): af4b969

Update README.md

Files changed (1) hide show

README.md +40 -3

README.md CHANGED Viewed

@@ -1,3 +1,40 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+I have added my empty models using GPT-3 Standard as well as Llama 3 and Mistral architectures.
+My smaller GPT-2 models utilize LayerNorm and FFN layers, whereas for larger models,
+I have replaced these components with RMSNorm and SwiGLU. This adjustment allows a smoother transition to large model architectures,
+including models with 8B, 33B, 70B, and 120B parameters.
+Transformer block is not frozen that give more power to tune model from scratch
+My GPT-2 Archtecure similar classic GPT-2 transfomer
+CustomEmbedding
+FrozenSignatureLayer
+LearnedPositionalEmbedding
+TransformerBlock ->  MultiHeadAttention
+TransformerBlock ->  LayerNorm
+TransformerBlock ->  LayerNorm
+TransformerBlock ->  ffn --> Linear
+TransformerBlock ->  ffn --> Activation::gelu
+TransformerBlock ->  ffn --> Linear
+LayerNorm
+Linear
+My GPT-3 Archtecure similar  as LLAMA 3 and Mistral
+CustomEmbedding
+# Positional Embedding removed, RoPE intergared in Attention
+TransformerBlock ->  MultiHeadAttention
+TransformerBlock ->  SwiGLUFeedForward - > Linear # Gate Layer
+TransformerBlock ->  SwiGLUFeedForward - > Linear # Up Layer
+TransformerBlock ->  SwiGLUFeedForward - > Linear # Projection (Down) Layer
+TransformerBlock ->  RMSNorm
+RMSNorm
+Linear
+FrozenSignatureLayer
+CMS Manhattan  Copyright: Copyright (c) 2002-2026
+It was replaced layers with