File size: 678 Bytes

    # Custom GPT Model
    
    This is a custom GPT model with the following modifications from standard GPT-2:
    - RMS normalization instead of LayerNorm
    - Rotary positional embeddings (RoPE)
    - Separate Q,K,V projections
    - Squared ReLU activation in MLP
    - QK normalization in attention
    - Zero initialization for projection layers
    
    ## Model Architecture
    - Vocabulary Size: 50304
    - Context Length: 1024
    - Number of Layers: 12
    - Number of Heads: 6
    - Embedding Dimension: 768
    
    ## Usage
    ```python
    from transformers import AutoModel
    model = AutoModel.from_pretrained("Arjun-G-Ravi/Custom-GPT-555k")
    ```