Update README.md
Browse files
README.md
CHANGED
|
@@ -6,6 +6,8 @@ My smaller GPT-2 models utilize LayerNorm and FFN layers, whereas for larger mod
|
|
| 6 |
I have replaced these components with RMSNorm and SwiGLU. This adjustment allows a smoother transition to large model architectures,
|
| 7 |
including models with 8B, 33B, 70B, and 120B parameters.
|
| 8 |
|
|
|
|
|
|
|
| 9 |
Transformer block is not frozen that give more power to tune model from scratch
|
| 10 |
|
| 11 |
My GPT-2 Archtecure similar classic GPT-2 transfomer
|
|
|
|
| 6 |
I have replaced these components with RMSNorm and SwiGLU. This adjustment allows a smoother transition to large model architectures,
|
| 7 |
including models with 8B, 33B, 70B, and 120B parameters.
|
| 8 |
|
| 9 |
+
So please GPT-2 huggingface tokenizer for english and for multi languages bert tokenizer from huggingface library .
|
| 10 |
+
|
| 11 |
Transformer block is not frozen that give more power to tune model from scratch
|
| 12 |
|
| 13 |
My GPT-2 Archtecure similar classic GPT-2 transfomer
|