Arjun-G-Ravi
/

Custom-GPT-555k

Model card Files Files and versions

Custom-GPT-555k / README.md

Arjun-G-Ravi's picture

Update README.md

234f92d verified 11 months ago

|

history blame contribute delete

548 Bytes

	`NOT IN WORKING STATE, PLS WAIT`

	# Custom GPT Model

	This is a custom GPT model with:
	- RMS normalization
	- Rotary positional embeddings (RoPE)
	- Separate Q,K,V projections
	- Squared ReLU activation in MLP
	- QK normalization in attention
	- Zero initialization for projection layers

	## Architecture
	- Vocabulary Size: 50304
	- Context Length: 1024
	- Number of Layers: 12
	- Number of Heads: 6
	- Embedding Dimension: 768

	## Usage
	```python
	from transformers import AutoModel
	model = AutoModel.from_pretrained("Arjun-G-Ravi/Custom-GPT-555k")
	```