chegde
/

llama-fineweb-base

Model card Files Files and versions

llama-fineweb-base / README.md

chegde's picture

Update README.md

6bbf8c7 verified 7 months ago

|

history blame contribute delete

875 Bytes

	---
	license: mit
	---

	# Custom Llama-style

	This repository contains a single `.pt` checkpoint file from a fine-tuned model.

	This model is NOT directly usable with `transformers.AutoModel.from_pretrained()` yet. It needs to be converted to the Hugging Face format first.

	## Training Details

	- Framework: [modded-nanoGPT-soap](https://github.com/nikhilvyas/modded-nanogpt-SOAP).
	- Architecture: This model uses modern features and is NOT a standard GPT-2.
	- Positional Embeddings: Rotary Position Embeddings (RoPE)
	- Normalization: RMSNorm
	- Bias: Linear layers trained with `bias=False`.

	## Model Configuration

	This is the information needed to perform the conversion:

	- `n_layer`: 12
	- `n_head`: 12
	- `n_embd`: 768
	- `vocab_size`: 50257
	- `block_size`: 1024

	## Tokenizer

	The model was trained with the standard `gpt2` tokenizer.