batmanLovesAI
/

HeliumLM

+---
+license: mit
+datasets:
+- roneneldan/TinyStories
+language:
+- en
+metrics:
+- perplexity
+pipeline_tag: text-generation
+tags:
+- slm
+- transformer
+- attention
+- optimization
+- pytorch
+- tinystories
+- educational
+---
+# Model Card for Helium-Nano (45M)
+**Helium-Nano** is a 45-million parameter Small Language Model (SLM) trained on the TinyStories dataset. It demonstrates how a highly optimized custom Transformer architecture can achieve coherent English storytelling capabilities with minimal compute resources. The model was trained in under 1 hour on a single Nvidia L4 GPU, achieving a throughput of **409k tokens/second** via PyTorch 2.0 compile and architectural optimizations.
+## Model Details
+### Model Description
+Helium-Nano is a decoder-only Transformer designed to investigate training dynamics and scaling laws in low-resource environments. Despite its small size, it produces grammatically correct and narratively consistent short stories.
+The primary goal of this model was engineering efficiency. By implementing **BFloat16 mixed precision**, **Flash Attention principles**, **Torch.compile (Inductor)**, and **Float32-optimized Rotary Embeddings (RoPE)**, the training pipeline achieved a 16x speedup over standard eager-mode baselines.
+- **Developed by:** Debmalya/batmanLovesAI
+- **Model type:** Decoder-only Transformer (Custom Architecture)
+- **Language(s) (NLP):** English
+- **License:** MIT
+- **Finetuned from model:** N/A (Trained from scratch)
+### Model Sources
+- **Repository:** [Link to Github Repo](https://github.com/DebmalyaSen34/HeliumLM)
+- **Dataset Paper:** [TinyStories: How Small Can Language Models Be?](https://arxiv.org/abs/2305.07759)
+- **Optimization Techniques:** [Small Language Models: Architectures, Techniques, Evaluation, Problems and Future Adaptation](https://arxiv.org/abs/2505.19529)
+## Uses
+### Direct Use
+- **Story Generation:** Generating simple, coherent short stories suitable for early childhood reading levels.
+- **Educational:** A lightweight baseline for experimenting with model interpretation, quantization, or fine-tuning on consumer hardware.
+- **Performance Benchmarking:** Testing inference speeds of small transformers on various hardware.
+### Out-of-Scope Use
+- **Factual Queries:** The model is trained on fiction; it has no world knowledge and will hallucinate facts.
+- **Reasoning/Math:** The model is not capable of complex logic or arithmetic.
+- **Harmful Content:** While the dataset is heavily filtered, users should not attempt to generate toxic or biased content.
+## Bias, Risks, and Limitations
+- **Dataset Bias:** The model reflects the vocabulary and concepts found in the TinyStories dataset, which focuses on simple, positive narratives using a limited vocabulary (approx 3-year-old level).
+- **Repetition:** Like many SLMs, the model may enter repetitive loops if the temperature is too low or repetition penalty is not applied during inference.
+- **Hallucinations:** The model prioritizes grammatical structure over semantic logic.
+## How to Get Started with the Model
+Since this uses a custom architecture, you need to instantiate the model class before loading weights.
+```python
+import torch
+from tokenizers import Tokenizer
+# Assuming TinySLM class is defined in your local files
+# 1. Load Tokenizer
+tokenizer = Tokenizer.from_file("tokenizer.json")
+# 2. Initialize Model
+config = {
+    "vocab_size": 32000,
+    "d_model": 512,
+    "n_head": 8,
+    "n_layers": 10,
+    "max_seq_len": 512
+}
+model = TinySLM(config)
+# 3. Load Weights
+state_dict = torch.load("helium_nano_45m.pt", map_location="cpu")
+model.load_state_dict(state_dict)
+model.eval()
+# 4. Generate
+prompt = "Once upon a time, there was a little"
+# ... inference code ...