Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,10 +1,122 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
tags:
|
| 3 |
-
-
|
| 4 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
---
|
| 6 |
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
pipeline_tag: text-generation
|
| 3 |
+
library_name: pytorch
|
| 4 |
+
language:
|
| 5 |
+
- en
|
| 6 |
+
license: mit
|
| 7 |
+
datasets:
|
| 8 |
+
- roneneldan/TinyStories
|
| 9 |
tags:
|
| 10 |
+
- llama2
|
| 11 |
+
- tinystories
|
| 12 |
+
- from-scratch
|
| 13 |
+
- causal-lm
|
| 14 |
+
- story-generation
|
| 15 |
+
widget:
|
| 16 |
+
- text: "Once upon a time"
|
| 17 |
+
example_title: "Story start"
|
| 18 |
+
output:
|
| 19 |
+
text: "Once upon a time, there was a little boy named Timmy. Timmy loved to play in the sand at the beach. He would build big sandcastles and dig deep holes. One day, Timmy's mom took him to the doctor because he was feeling sick. The doctor said Timmy needed to rest in bed."
|
| 20 |
+
- text: "The little cat"
|
| 21 |
+
example_title: "Animal story"
|
| 22 |
+
output:
|
| 23 |
+
text: "The little cat was very happy. She liked to play with her friends in the park. One day, she saw a big dog. The dog was very friendly and wanted to play too. They played together all day long."
|
| 24 |
+
- text: "One day, a boy"
|
| 25 |
+
example_title: "Adventure"
|
| 26 |
+
output:
|
| 27 |
+
text: "One day, a boy named Tim went to the park with his mom. He saw a big tree and wanted to climb it. His mom said, 'Be careful, Tim!' Tim climbed up and saw a bird's nest with three little eggs."
|
| 28 |
---
|
| 29 |
|
| 30 |
+
# Llama 2 15M — TinyStories
|
| 31 |
+
|
| 32 |
+
A 15M parameter Llama 2 model pretrained on the [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) dataset. Pretrained by [Andrej Karpathy](https://github.com/karpathy/llama2.c) (stories15M checkpoint), uploaded here for easy loading and fine-tuning.
|
| 33 |
+
|
| 34 |
+
## Model Details
|
| 35 |
+
|
| 36 |
+
| Parameter | Value |
|
| 37 |
+
|---|---|
|
| 38 |
+
| Architecture | Llama 2 (RoPE, RMSNorm, SwiGLU, GQA) |
|
| 39 |
+
| Parameters | 15.2M |
|
| 40 |
+
| Vocabulary | 32,000 (SentencePiece) |
|
| 41 |
+
| Context Length | 256 |
|
| 42 |
+
| Embedding Dim | 288 |
|
| 43 |
+
| Attention Heads | 6 |
|
| 44 |
+
| KV Heads | 6 |
|
| 45 |
+
| Transformer Layers | 6 |
|
| 46 |
+
| Dropout | 0.0 |
|
| 47 |
+
| Activation | SiLU (SwiGLU) |
|
| 48 |
+
|
| 49 |
+
**Architecture:** Token embeddings → Dropout → 6x Transformer blocks (pre-norm RMSNorm, RoPE attention, SwiGLU FFN, residual connections) → RMSNorm → Linear output
|
| 50 |
+
|
| 51 |
+
## Training
|
| 52 |
+
|
| 53 |
+
| Metric | Value |
|
| 54 |
+
|---|---|
|
| 55 |
+
| Dataset | TinyStories |
|
| 56 |
+
| Iterations | 298,000 |
|
| 57 |
+
| Batch Size | 128 x 4 grad accum = 512 effective |
|
| 58 |
+
| Learning Rate | 5e-4 |
|
| 59 |
+
| Optimizer | AdamW (betas=0.9/0.95, weight_decay=0.1) |
|
| 60 |
+
| Precision | bfloat16 |
|
| 61 |
+
| Warmup | 1,000 iterations |
|
| 62 |
+
| Val Loss | 1.072 |
|
| 63 |
+
| Val Perplexity | 2.92 |
|
| 64 |
+
|
| 65 |
+
## Sample Output
|
| 66 |
+
|
| 67 |
+
> Once upon a time, there was a little boy named Timmy. Timmy loved to play in the sand at the beach. He would build big sandcastles and dig deep holes. One day, Timmy's mom took him to the doctor because he was feeling sick. The doctor said Timmy needed to rest in bed.
|
| 68 |
+
> Timmy's mom noticed that he had a thick book in his hand. She asked him what was inside. Timmy said he didn't know. His mom explained that the book was just a few days old and had gone to a faraway place. She told Timmy that he should take care of himself and rest. Timmy promised to take better care of himself.
|
| 69 |
+
> After a few days, Timmy felt much better. He went back to the beach and played in the sand. He made a big sandcastle and showed it to his mom. She was proud of him for taking care of himself. Timmy was happy that he...
|
| 70 |
+
|
| 71 |
+
*Generated with temperature=0.8, top_k=40*
|
| 72 |
+
|
| 73 |
+
## Usage
|
| 74 |
+
|
| 75 |
+
This is a **custom PyTorch model** (not a transformers-compatible model). You need the source code from the [GitHub repository](https://github.com/aryandeore/monday_morning_moral) to load it.
|
| 76 |
+
|
| 77 |
+
### Setup
|
| 78 |
+
|
| 79 |
+
```bash
|
| 80 |
+
git clone https://github.com/aryandeore/monday_morning_moral.git
|
| 81 |
+
cd monday_morning_moral
|
| 82 |
+
uv sync
|
| 83 |
+
```
|
| 84 |
+
|
| 85 |
+
### Generate
|
| 86 |
+
|
| 87 |
+
```python
|
| 88 |
+
import torch
|
| 89 |
+
from models.llama2 import Transformer
|
| 90 |
+
from sentencepiece import SentencePieceProcessor
|
| 91 |
+
|
| 92 |
+
# Load model
|
| 93 |
+
model = Transformer.from_pretrained("0rn0/llama2-15m-tinystories")
|
| 94 |
+
model.eval()
|
| 95 |
+
|
| 96 |
+
# Load tokenizer
|
| 97 |
+
sp = SentencePieceProcessor(model_file="tokenizer.model")
|
| 98 |
+
|
| 99 |
+
# Generate
|
| 100 |
+
prompt = "Once upon a time"
|
| 101 |
+
tokens = [sp.bos_id()] + sp.encode(prompt)
|
| 102 |
+
idx = torch.tensor([tokens])
|
| 103 |
+
output = model.generate(idx, max_new_tokens=200, temperature=0.8, top_k=40)
|
| 104 |
+
print(sp.decode(output[0].tolist()))
|
| 105 |
+
```
|
| 106 |
+
|
| 107 |
+
## Limitations
|
| 108 |
+
|
| 109 |
+
- Trained only on TinyStories — generates simple children's stories, not general text
|
| 110 |
+
- No instruction tuning — does not follow prompts or answer questions
|
| 111 |
+
- Small model — limited coherence over long sequences
|
| 112 |
+
- English only
|
| 113 |
+
|
| 114 |
+
## Credits
|
| 115 |
+
|
| 116 |
+
- Model weights from [karpathy/tinyllamas](https://huggingface.co/karpathy/tinyllamas)
|
| 117 |
+
- Architecture from [llama2.c](https://github.com/karpathy/llama2.c)
|
| 118 |
+
- Dataset: [TinyStories](https://arxiv.org/abs/2305.07759) by Eldan & Li
|
| 119 |
+
|
| 120 |
+
## Source Code
|
| 121 |
+
|
| 122 |
+
Full implementation: [github.com/aryandeore/monday_morning_moral](https://github.com/aryandeore/monday_morning_moral)
|