Custom-GPT-555k / README.md
Arjun-G-Ravi's picture
Update README.md
c0b6a78 verified
|
raw
history blame
515 Bytes

Custom GPT Model

This is a custom GPT model with:

  • RMS normalization
  • Rotary positional embeddings (RoPE)
  • Separate Q,K,V projections
  • Squared ReLU activation in MLP
  • QK normalization in attention
  • Zero initialization for projection layers

Architecture

  • Vocabulary Size: 50304
  • Context Length: 1024
  • Number of Layers: 12
  • Number of Heads: 6
  • Embedding Dimension: 768

Usage

from transformers import AutoModel
model = AutoModel.from_pretrained("Arjun-G-Ravi/Custom-GPT-555k")