LLMsHub
/

eGPT-100M-bytes-untrained

llama-architecture

Model card Files Files and versions

eGPT-100M-bytes-untrained / README.md

macabdul9's picture

Upload folder using huggingface_hub

4e74b1b verified 4 days ago

|

history blame contribute delete

874 Bytes

	---
	language: en
	tags:
	- egpt
	- llama-architecture
	- decoder-only
	- untrained
	license: mit
	---

	# eGPT-100M-bytes-untrained

	Randomly initialized eGPT decoder-only model (94.9M parameters). Not trained.

	## Architecture

	\| Field \| Value \|
	\|---\|---\|
	\| Parameters \| 94.9M \|
	\| Layers \| 8 \|
	\| Dim \| 1024 \|
	\| Heads (Q) \| 8 \|
	\| Heads (KV) \| 4 \|
	\| Head dim \| 128 \|
	\| FFN hidden \| 2816 \|
	\| Max seq len \| 2048 \|
	\| Vocab size \| 256 \|
	\| Tokenizer \| `google/byt5-small` \|

	## Loading

	```python
	from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer

	tok = AutoTokenizer.from_pretrained("LLMsHub/eGPT-100M-bytes-untrained", trust_remote_code=True)
	cfg = AutoConfig.from_pretrained("LLMsHub/eGPT-100M-bytes-untrained", trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained("LLMsHub/eGPT-100M-bytes-untrained", trust_remote_code=True)
	```