llama3_baseline_dev / README.md
smithblack-0's picture
Update architecture and tokenizer
98abb50 verified
|
Raw
History Blame Contribute Delete
1.96 kB
---
language:
- en
license: mit
library_name: transformers
pipeline_tag: text-generation
tags:
- pytorch
- research
- llama
---
# advanced-transformers-lib -- Llama 3 Baseline
A Llama 3-style decoder-only transformer architecture for research. No pretrained
weights -- pull the architecture from the Hub and instantiate a freshly initialised
model from config. Override any parameter at instantiation time.
> **Important:** `trust_remote_code=True` is required. It downloads the architecture
> source files from the Hub and imports them into your Python process. Review the
> source at [smithblack-0/llama3_baseline_dev](https://huggingface.co/smithblack-0/llama3_baseline_dev) before use.
## Usage
```python
from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
# Pull architecture config -- override any parameter at instantiation time
config = AutoConfig.from_pretrained(
"smithblack-0/llama3_baseline_dev",
trust_remote_code=True,
num_hidden_layers=16, # example override
)
# Instantiate with fresh random weights -- no checkpoint required
model = AutoModelForCausalLM.from_config(config, trust_remote_code=True)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("smithblack-0/llama3_baseline_dev")
# Save and reload after training
model.save_pretrained("./checkpoint")
model = AutoModelForCausalLM.from_pretrained("./checkpoint", trust_remote_code=True)
```
## Default Configuration
| Parameter | Default |
|-----------|---------|
| `vocab_size` | 50277 |
| `hidden_size` | 768 |
| `intermediate_size` | 1568 |
| `num_hidden_layers` | 24 |
| `num_attention_heads` | 16 |
| `num_key_value_heads` | 4 |
| `head_dim` | 48 |
| `max_position_embeddings` | 8192 |
| `rope_theta` | 500000.0 |
## License
MIT. Clean-room synthesis: the human author has not read the Llama source code.
Architectural decisions derive from the published paper. Tokenizer is GPT-NeoX
(`EleutherAI/gpt-neox-20b`, Apache 2.0).