--- language: - en license: mit library_name: transformers pipeline_tag: text-generation tags: - pytorch - research - llama --- # advanced-transformers-lib -- Llama 3 Baseline A Llama 3-style decoder-only transformer architecture for research. No pretrained weights -- pull the architecture from the Hub and instantiate a freshly initialised model from config. Override any parameter at instantiation time. > **Important:** `trust_remote_code=True` is required. It downloads the architecture > source files from the Hub and imports them into your Python process. Review the > source at [smithblack-0/llama3_baseline_dev](https://huggingface.co/smithblack-0/llama3_baseline_dev) before use. ## Usage ```python from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer # Pull architecture config -- override any parameter at instantiation time config = AutoConfig.from_pretrained( "smithblack-0/llama3_baseline_dev", trust_remote_code=True, num_hidden_layers=16, # example override ) # Instantiate with fresh random weights -- no checkpoint required model = AutoModelForCausalLM.from_config(config, trust_remote_code=True) # Load tokenizer tokenizer = AutoTokenizer.from_pretrained("smithblack-0/llama3_baseline_dev") # Save and reload after training model.save_pretrained("./checkpoint") model = AutoModelForCausalLM.from_pretrained("./checkpoint", trust_remote_code=True) ``` ## Default Configuration | Parameter | Default | |-----------|---------| | `vocab_size` | 50277 | | `hidden_size` | 768 | | `intermediate_size` | 1568 | | `num_hidden_layers` | 24 | | `num_attention_heads` | 16 | | `num_key_value_heads` | 4 | | `head_dim` | 48 | | `max_position_embeddings` | 8192 | | `rope_theta` | 500000.0 | ## License MIT. Clean-room synthesis: the human author has not read the Llama source code. Architectural decisions derive from the published paper. Tokenizer is GPT-NeoX (`EleutherAI/gpt-neox-20b`, Apache 2.0).