Custom-GPT-555k / README.md
Arjun-G-Ravi's picture
Upload folder using huggingface_hub
3220769 verified
|
raw
history blame
678 Bytes
# Custom GPT Model

This is a custom GPT model with the following modifications from standard GPT-2:
- RMS normalization instead of LayerNorm
- Rotary positional embeddings (RoPE)
- Separate Q,K,V projections
- Squared ReLU activation in MLP
- QK normalization in attention
- Zero initialization for projection layers

## Model Architecture
- Vocabulary Size: 50304
- Context Length: 1024
- Number of Layers: 12
- Number of Heads: 6
- Embedding Dimension: 768

## Usage
```python
from transformers import AutoModel
model = AutoModel.from_pretrained("Arjun-G-Ravi/Custom-GPT-555k")
```