llama-fineweb-base / README.md
chegde's picture
Update README.md
6bbf8c7 verified
---
license: mit
---
# Custom Llama-style
This repository contains a single `.pt` checkpoint file from a fine-tuned model.
**This model is NOT directly usable with `transformers.AutoModel.from_pretrained()` yet.** It needs to be converted to the Hugging Face format first.
## Training Details
- **Framework:** [modded-nanoGPT-soap](https://github.com/nikhilvyas/modded-nanogpt-SOAP).
- **Architecture:** This model uses modern features and is NOT a standard GPT-2.
- **Positional Embeddings:** Rotary Position Embeddings (RoPE)
- **Normalization:** RMSNorm
- **Bias:** Linear layers trained with `bias=False`.
## Model Configuration
This is the information needed to perform the conversion:
- `n_layer`: 12
- `n_head`: 12
- `n_embd`: 768
- `vocab_size`: 50257
- `block_size`: 1024
## Tokenizer
The model was trained with the standard `gpt2` tokenizer.