| language: en | |
| tags: | |
| - egpt | |
| - llama-architecture | |
| - decoder-only | |
| - untrained | |
| license: mit | |
| # eGPT-100M-bytes-untrained | |
| Randomly initialized eGPT decoder-only model (94.9M parameters). **Not trained.** | |
| ## Architecture | |
| | Field | Value | | |
| |---|---| | |
| | Parameters | 94.9M | | |
| | Layers | 8 | | |
| | Dim | 1024 | | |
| | Heads (Q) | 8 | | |
| | Heads (KV) | 4 | | |
| | Head dim | 128 | | |
| | FFN hidden | 2816 | | |
| | Max seq len | 2048 | | |
| | Vocab size | 256 | | |
| | Tokenizer | `google/byt5-small` | | |
| ## Loading | |
| ```python | |
| from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer | |
| tok = AutoTokenizer.from_pretrained("LLMsHub/eGPT-100M-bytes-untrained", trust_remote_code=True) | |
| cfg = AutoConfig.from_pretrained("LLMsHub/eGPT-100M-bytes-untrained", trust_remote_code=True) | |
| model = AutoModelForCausalLM.from_pretrained("LLMsHub/eGPT-100M-bytes-untrained", trust_remote_code=True) | |
| ``` | |