batmanLovesAI commited on
Commit
cba4ded
·
verified ·
1 Parent(s): dd474d5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +92 -3
README.md CHANGED
@@ -1,3 +1,92 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - roneneldan/TinyStories
5
+ language:
6
+ - en
7
+ metrics:
8
+ - perplexity
9
+ pipeline_tag: text-generation
10
+ tags:
11
+ - slm
12
+ - transformer
13
+ - attention
14
+ - optimization
15
+ - pytorch
16
+ - tinystories
17
+ - educational
18
+ ---
19
+ # Model Card for Helium-Nano (45M)
20
+
21
+ **Helium-Nano** is a 45-million parameter Small Language Model (SLM) trained on the TinyStories dataset. It demonstrates how a highly optimized custom Transformer architecture can achieve coherent English storytelling capabilities with minimal compute resources. The model was trained in under 1 hour on a single Nvidia L4 GPU, achieving a throughput of **409k tokens/second** via PyTorch 2.0 compile and architectural optimizations.
22
+
23
+ ## Model Details
24
+
25
+ ### Model Description
26
+
27
+ Helium-Nano is a decoder-only Transformer designed to investigate training dynamics and scaling laws in low-resource environments. Despite its small size, it produces grammatically correct and narratively consistent short stories.
28
+
29
+ The primary goal of this model was engineering efficiency. By implementing **BFloat16 mixed precision**, **Flash Attention principles**, **Torch.compile (Inductor)**, and **Float32-optimized Rotary Embeddings (RoPE)**, the training pipeline achieved a 16x speedup over standard eager-mode baselines.
30
+
31
+ - **Developed by:** Debmalya/batmanLovesAI
32
+ - **Model type:** Decoder-only Transformer (Custom Architecture)
33
+ - **Language(s) (NLP):** English
34
+ - **License:** MIT
35
+ - **Finetuned from model:** N/A (Trained from scratch)
36
+
37
+ ### Model Sources
38
+
39
+ - **Repository:** [Link to Github Repo](https://github.com/DebmalyaSen34/HeliumLM)
40
+ - **Dataset Paper:** [TinyStories: How Small Can Language Models Be?](https://arxiv.org/abs/2305.07759)
41
+ - **Optimization Techniques:** [Small Language Models: Architectures, Techniques, Evaluation, Problems and Future Adaptation](https://arxiv.org/abs/2505.19529)
42
+
43
+ ## Uses
44
+
45
+ ### Direct Use
46
+
47
+ - **Story Generation:** Generating simple, coherent short stories suitable for early childhood reading levels.
48
+ - **Educational:** A lightweight baseline for experimenting with model interpretation, quantization, or fine-tuning on consumer hardware.
49
+ - **Performance Benchmarking:** Testing inference speeds of small transformers on various hardware.
50
+
51
+ ### Out-of-Scope Use
52
+
53
+ - **Factual Queries:** The model is trained on fiction; it has no world knowledge and will hallucinate facts.
54
+ - **Reasoning/Math:** The model is not capable of complex logic or arithmetic.
55
+ - **Harmful Content:** While the dataset is heavily filtered, users should not attempt to generate toxic or biased content.
56
+
57
+ ## Bias, Risks, and Limitations
58
+
59
+ - **Dataset Bias:** The model reflects the vocabulary and concepts found in the TinyStories dataset, which focuses on simple, positive narratives using a limited vocabulary (approx 3-year-old level).
60
+ - **Repetition:** Like many SLMs, the model may enter repetitive loops if the temperature is too low or repetition penalty is not applied during inference.
61
+ - **Hallucinations:** The model prioritizes grammatical structure over semantic logic.
62
+
63
+ ## How to Get Started with the Model
64
+
65
+ Since this uses a custom architecture, you need to instantiate the model class before loading weights.
66
+
67
+ ```python
68
+ import torch
69
+ from tokenizers import Tokenizer
70
+ # Assuming TinySLM class is defined in your local files
71
+
72
+ # 1. Load Tokenizer
73
+ tokenizer = Tokenizer.from_file("tokenizer.json")
74
+
75
+ # 2. Initialize Model
76
+ config = {
77
+ "vocab_size": 32000,
78
+ "d_model": 512,
79
+ "n_head": 8,
80
+ "n_layers": 10,
81
+ "max_seq_len": 512
82
+ }
83
+ model = TinySLM(config)
84
+
85
+ # 3. Load Weights
86
+ state_dict = torch.load("helium_nano_45m.pt", map_location="cpu")
87
+ model.load_state_dict(state_dict)
88
+ model.eval()
89
+
90
+ # 4. Generate
91
+ prompt = "Once upon a time, there was a little"
92
+ # ... inference code ...