--- license: mit --- # Custom Llama-style This repository contains a single `.pt` checkpoint file from a fine-tuned model. **This model is NOT directly usable with `transformers.AutoModel.from_pretrained()` yet.** It needs to be converted to the Hugging Face format first. ## Training Details - **Framework:** [modded-nanoGPT-soap](https://github.com/nikhilvyas/modded-nanogpt-SOAP). - **Architecture:** This model uses modern features and is NOT a standard GPT-2. - **Positional Embeddings:** Rotary Position Embeddings (RoPE) - **Normalization:** RMSNorm - **Bias:** Linear layers trained with `bias=False`. ## Model Configuration This is the information needed to perform the conversion: - `n_layer`: 12 - `n_head`: 12 - `n_embd`: 768 - `vocab_size`: 50257 - `block_size`: 1024 ## Tokenizer The model was trained with the standard `gpt2` tokenizer.