chegde
/

llama-fineweb-base

Model card Files Files and versions

chegde commited on Aug 3, 2025

Commit

6bbf8c7

·

verified ·

1 Parent(s): be4ee5c

Update README.md

Files changed (1) hide show

README.md +31 -3

README.md CHANGED Viewed

@@ -1,3 +1,31 @@
----
-license: mit
----

+---
+license: mit
+---
+# Custom Llama-style
+This repository contains a single `.pt` checkpoint file from a fine-tuned model.
+**This model is NOT directly usable with `transformers.AutoModel.from_pretrained()` yet.** It needs to be converted to the Hugging Face format first.
+## Training Details
+- **Framework:**  [modded-nanoGPT-soap](https://github.com/nikhilvyas/modded-nanogpt-SOAP).
+- **Architecture:** This model uses modern features and is NOT a standard GPT-2.
+  - **Positional Embeddings:** Rotary Position Embeddings (RoPE)
+  - **Normalization:** RMSNorm
+  - **Bias:** Linear layers trained with `bias=False`.
+## Model Configuration
+This is the information needed to perform the conversion:
+- `n_layer`: 12
+- `n_head`: 12
+- `n_embd`: 768
+- `vocab_size`: 50257
+- `block_size`: 1024
+## Tokenizer
+The model was trained with the standard `gpt2` tokenizer.