Upload folder using huggingface_hub

Files changed (3) hide show

README.md ADDED Viewed

+    # Custom GPT Model
+    This is a custom GPT model with the following modifications from standard GPT-2:
+    - RMS normalization instead of LayerNorm
+    - Rotary positional embeddings (RoPE)
+    - Separate Q,K,V projections
+    - Squared ReLU activation in MLP
+    - QK normalization in attention
+    - Zero initialization for projection layers
+    ## Model Architecture
+    - Vocabulary Size: 50304
+    - Context Length: 1024
+    - Number of Layers: 12
+    - Number of Heads: 6
+    - Embedding Dimension: 768
+    ## Usage
+    ```python
+    from transformers import AutoModel
+    model = AutoModel.from_pretrained("Arjun-G-Ravi/Custom-GPT-555k")
+    ```

config.json ADDED Viewed

+{
+  "_attn_implementation_autoset": true,
+  "architectures": [
+    "CustomGPTPreTrainedModel"
+  ],
+  "block_size": 1024,
+  "model_type": "custom_gpt",
+  "n_embd": 768,
+  "n_head": 6,
+  "n_layer": 12,
+  "tokenizer_class": "GPT2Tokenizer",
+  "transformers_version": "4.48.1",
+  "vocab_size": 50304
+}

pytorch_model.bin ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:a9b9241bfa5721a46c8186e18b74637299de0857ed13679a524e85dac34e08d0
+size 494301897