0rn0 commited on
Commit
cd75d4b
·
verified ·
1 Parent(s): fb5267b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +118 -6
README.md CHANGED
@@ -1,10 +1,122 @@
1
  ---
 
 
 
 
 
 
 
2
  tags:
3
- - model_hub_mixin
4
- - pytorch_model_hub_mixin
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  ---
6
 
7
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
- - Code: [More Information Needed]
9
- - Paper: [More Information Needed]
10
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ pipeline_tag: text-generation
3
+ library_name: pytorch
4
+ language:
5
+ - en
6
+ license: mit
7
+ datasets:
8
+ - roneneldan/TinyStories
9
  tags:
10
+ - llama2
11
+ - tinystories
12
+ - from-scratch
13
+ - causal-lm
14
+ - story-generation
15
+ widget:
16
+ - text: "Once upon a time"
17
+ example_title: "Story start"
18
+ output:
19
+ text: "Once upon a time, there was a little boy named Timmy. Timmy loved to play in the sand at the beach. He would build big sandcastles and dig deep holes. One day, Timmy's mom took him to the doctor because he was feeling sick. The doctor said Timmy needed to rest in bed."
20
+ - text: "The little cat"
21
+ example_title: "Animal story"
22
+ output:
23
+ text: "The little cat was very happy. She liked to play with her friends in the park. One day, she saw a big dog. The dog was very friendly and wanted to play too. They played together all day long."
24
+ - text: "One day, a boy"
25
+ example_title: "Adventure"
26
+ output:
27
+ text: "One day, a boy named Tim went to the park with his mom. He saw a big tree and wanted to climb it. His mom said, 'Be careful, Tim!' Tim climbed up and saw a bird's nest with three little eggs."
28
  ---
29
 
30
+ # Llama 2 15M TinyStories
31
+
32
+ A 15M parameter Llama 2 model pretrained on the [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) dataset. Pretrained by [Andrej Karpathy](https://github.com/karpathy/llama2.c) (stories15M checkpoint), uploaded here for easy loading and fine-tuning.
33
+
34
+ ## Model Details
35
+
36
+ | Parameter | Value |
37
+ |---|---|
38
+ | Architecture | Llama 2 (RoPE, RMSNorm, SwiGLU, GQA) |
39
+ | Parameters | 15.2M |
40
+ | Vocabulary | 32,000 (SentencePiece) |
41
+ | Context Length | 256 |
42
+ | Embedding Dim | 288 |
43
+ | Attention Heads | 6 |
44
+ | KV Heads | 6 |
45
+ | Transformer Layers | 6 |
46
+ | Dropout | 0.0 |
47
+ | Activation | SiLU (SwiGLU) |
48
+
49
+ **Architecture:** Token embeddings → Dropout → 6x Transformer blocks (pre-norm RMSNorm, RoPE attention, SwiGLU FFN, residual connections) → RMSNorm → Linear output
50
+
51
+ ## Training
52
+
53
+ | Metric | Value |
54
+ |---|---|
55
+ | Dataset | TinyStories |
56
+ | Iterations | 298,000 |
57
+ | Batch Size | 128 x 4 grad accum = 512 effective |
58
+ | Learning Rate | 5e-4 |
59
+ | Optimizer | AdamW (betas=0.9/0.95, weight_decay=0.1) |
60
+ | Precision | bfloat16 |
61
+ | Warmup | 1,000 iterations |
62
+ | Val Loss | 1.072 |
63
+ | Val Perplexity | 2.92 |
64
+
65
+ ## Sample Output
66
+
67
+ > Once upon a time, there was a little boy named Timmy. Timmy loved to play in the sand at the beach. He would build big sandcastles and dig deep holes. One day, Timmy's mom took him to the doctor because he was feeling sick. The doctor said Timmy needed to rest in bed.
68
+ > Timmy's mom noticed that he had a thick book in his hand. She asked him what was inside. Timmy said he didn't know. His mom explained that the book was just a few days old and had gone to a faraway place. She told Timmy that he should take care of himself and rest. Timmy promised to take better care of himself.
69
+ > After a few days, Timmy felt much better. He went back to the beach and played in the sand. He made a big sandcastle and showed it to his mom. She was proud of him for taking care of himself. Timmy was happy that he...
70
+
71
+ *Generated with temperature=0.8, top_k=40*
72
+
73
+ ## Usage
74
+
75
+ This is a **custom PyTorch model** (not a transformers-compatible model). You need the source code from the [GitHub repository](https://github.com/aryandeore/monday_morning_moral) to load it.
76
+
77
+ ### Setup
78
+
79
+ ```bash
80
+ git clone https://github.com/aryandeore/monday_morning_moral.git
81
+ cd monday_morning_moral
82
+ uv sync
83
+ ```
84
+
85
+ ### Generate
86
+
87
+ ```python
88
+ import torch
89
+ from models.llama2 import Transformer
90
+ from sentencepiece import SentencePieceProcessor
91
+
92
+ # Load model
93
+ model = Transformer.from_pretrained("0rn0/llama2-15m-tinystories")
94
+ model.eval()
95
+
96
+ # Load tokenizer
97
+ sp = SentencePieceProcessor(model_file="tokenizer.model")
98
+
99
+ # Generate
100
+ prompt = "Once upon a time"
101
+ tokens = [sp.bos_id()] + sp.encode(prompt)
102
+ idx = torch.tensor([tokens])
103
+ output = model.generate(idx, max_new_tokens=200, temperature=0.8, top_k=40)
104
+ print(sp.decode(output[0].tolist()))
105
+ ```
106
+
107
+ ## Limitations
108
+
109
+ - Trained only on TinyStories — generates simple children's stories, not general text
110
+ - No instruction tuning — does not follow prompts or answer questions
111
+ - Small model — limited coherence over long sequences
112
+ - English only
113
+
114
+ ## Credits
115
+
116
+ - Model weights from [karpathy/tinyllamas](https://huggingface.co/karpathy/tinyllamas)
117
+ - Architecture from [llama2.c](https://github.com/karpathy/llama2.c)
118
+ - Dataset: [TinyStories](https://arxiv.org/abs/2305.07759) by Eldan & Li
119
+
120
+ ## Source Code
121
+
122
+ Full implementation: [github.com/aryandeore/monday_morning_moral](https://github.com/aryandeore/monday_morning_moral)