CMSManhattan
/

JiRack_GPT3_empty

Model card Files Files and versions

kgrabko commited on Nov 29, 2025

Commit

242b971

·

verified ·

1 Parent(s): 8a3271f

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -6,6 +6,8 @@ My smaller GPT-2 models utilize LayerNorm and FFN layers, whereas for larger mod
 I have replaced these components with RMSNorm and SwiGLU. This adjustment allows a smoother transition to large model architectures,
 including models with 8B, 33B, 70B, and 120B parameters.
 Transformer block is not frozen that give more power to tune model from scratch
 My GPT-2 Archtecure similar classic GPT-2 transfomer

 I have replaced these components with RMSNorm and SwiGLU. This adjustment allows a smoother transition to large model architectures,
 including models with 8B, 33B, 70B, and 120B parameters.
+So please GPT-2 huggingface tokenizer for english and for multi languages bert tokenizer from huggingface library .
 Transformer block is not frozen that give more power to tune model from scratch
 My GPT-2 Archtecure similar classic GPT-2 transfomer