| | --- |
| | license: mit |
| | --- |
| | |
| | # Custom Llama-style |
| |
|
| | This repository contains a single `.pt` checkpoint file from a fine-tuned model. |
| |
|
| | **This model is NOT directly usable with `transformers.AutoModel.from_pretrained()` yet.** It needs to be converted to the Hugging Face format first. |
| | |
| | ## Training Details |
| | |
| | - **Framework:** [modded-nanoGPT-soap](https://github.com/nikhilvyas/modded-nanogpt-SOAP). |
| | - **Architecture:** This model uses modern features and is NOT a standard GPT-2. |
| | - **Positional Embeddings:** Rotary Position Embeddings (RoPE) |
| | - **Normalization:** RMSNorm |
| | - **Bias:** Linear layers trained with `bias=False`. |
| | |
| | ## Model Configuration |
| | |
| | This is the information needed to perform the conversion: |
| | |
| | - `n_layer`: 12 |
| | - `n_head`: 12 |
| | - `n_embd`: 768 |
| | - `vocab_size`: 50257 |
| | - `block_size`: 1024 |
| | |
| | ## Tokenizer |
| | |
| | The model was trained with the standard `gpt2` tokenizer. |