LLMsHub
/

eGPT-500M-untrained

llama-architecture

Model card Files Files and versions

eGPT-500M-untrained / README.md

macabdul9's picture

Upload folder using huggingface_hub

0a51157 verified 25 days ago

|

history blame contribute delete

854 Bytes

	---
	language: en
	tags:
	- egpt
	- llama-architecture
	- decoder-only
	- untrained
	license: mit
	---

	# eGPT-500M-untrained

	Randomly initialized eGPT decoder-only model (542.2M parameters). Not trained.

	## Architecture

	\| Field \| Value \|
	\|---\|---\|
	\| Parameters \| 542.2M \|
	\| Layers \| 12 \|
	\| Dim \| 2048 \|
	\| Heads (Q) \| 16 \|
	\| Heads (KV) \| 4 \|
	\| Head dim \| 128 \|
	\| FFN hidden \| 5632 \|
	\| Max seq len \| 2048 \|
	\| Vocab size \| 256 \|
	\| Tokenizer \| `google/byt5-small` \|

	## Loading

	```python
	from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer

	tok = AutoTokenizer.from_pretrained("LLMsHub/eGPT-500M-untrained", trust_remote_code=True)
	cfg = AutoConfig.from_pretrained("LLMsHub/eGPT-500M-untrained", trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained("LLMsHub/eGPT-500M-untrained", trust_remote_code=True)
	```