| `NOT IN WORKING STATE, PLS WAIT` | |
| # Custom GPT Model | |
| This is a custom GPT model with: | |
| - RMS normalization | |
| - Rotary positional embeddings (RoPE) | |
| - Separate Q,K,V projections | |
| - Squared ReLU activation in MLP | |
| - QK normalization in attention | |
| - Zero initialization for projection layers | |
| ## Architecture | |
| - Vocabulary Size: 50304 | |
| - Context Length: 1024 | |
| - Number of Layers: 12 | |
| - Number of Heads: 6 | |
| - Embedding Dimension: 768 | |
| ## Usage | |
| ```python | |
| from transformers import AutoModel | |
| model = AutoModel.from_pretrained("Arjun-G-Ravi/Custom-GPT-555k") | |
| ``` | |