jonmabe commited on
Commit
81f2447
·
verified ·
1 Parent(s): 28fc45c

Update model card with documentation and examples

Browse files
Files changed (1) hide show
  1. README.md +127 -0
README.md ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ tags:
7
+ - tiny
8
+ - from-scratch
9
+ - educational
10
+ - causal-lm
11
+ - personal-llm
12
+ model-index:
13
+ - name: tiny-llm-54m
14
+ results: []
15
+ ---
16
+
17
+ # Tiny-LLM 54M
18
+
19
+ A small transformer language model (~54.93M parameters) trained from scratch for educational and experimental purposes.
20
+
21
+ ## Model Description
22
+
23
+ This is a decoder-only transformer trained from scratch on Wikipedia text. It demonstrates that meaningful language models can be trained on consumer hardware with modest compute budgets.
24
+
25
+ ### Architecture
26
+
27
+ | Component | Value |
28
+ |-----------|-------|
29
+ | Parameters | **54.93M** |
30
+ | Layers | 12 |
31
+ | Hidden Size | 512 |
32
+ | Attention Heads | 8 |
33
+ | Intermediate (FFN) | 1408 |
34
+ | Vocab Size | 32,000 |
35
+ | Max Sequence Length | 512 |
36
+ | Position Encoding | RoPE |
37
+ | Normalization | RMSNorm |
38
+ | Activation | SwiGLU |
39
+ | Weight Tying | Yes |
40
+
41
+ ### Training Details
42
+
43
+ | Parameter | Value |
44
+ |-----------|-------|
45
+ | Training Steps | 50,000 |
46
+ | Tokens | ~100M |
47
+ | Batch Size | 32 |
48
+ | Learning Rate | 3e-4 |
49
+ | Warmup Steps | 2,000 |
50
+ | Weight Decay | 0.1 |
51
+ | Hardware | NVIDIA RTX 5090 (32GB) |
52
+ | Training Time | ~3 hours |
53
+
54
+ ## Usage
55
+
56
+ ```python
57
+ import torch
58
+ from transformers import AutoTokenizer
59
+
60
+ # Load tokenizer (uses standard GPT-2 style tokenizer)
61
+ tokenizer = AutoTokenizer.from_pretrained("jonmabe/tiny-llm-54m")
62
+
63
+ # For custom model loading, see the model files
64
+ # This model uses a custom architecture - see scripts/ for inference code
65
+ ```
66
+
67
+ ### Generation Example
68
+
69
+ ```python
70
+ # Note: This model uses a custom architecture
71
+ # Full inference code available in the repository
72
+
73
+ prompt = "The history of artificial intelligence"
74
+ # Model generates continuation based on learned Wikipedia patterns
75
+ ```
76
+
77
+ ## Intended Use
78
+
79
+ - **Educational**: Understanding transformer training from scratch
80
+ - **Experimental**: Testing fine-tuning approaches on small models
81
+ - **Personal LLM**: Base for personal voice/style fine-tuning
82
+ - **Research**: Lightweight model for NLP experiments
83
+
84
+ ## Limitations
85
+
86
+ - Small model size limits knowledge and capabilities
87
+ - Trained only on Wikipedia - limited domain coverage
88
+ - Not suitable for production use cases requiring high quality
89
+ - May generate factually incorrect information
90
+ - No RLHF or instruction tuning
91
+
92
+ ## Training Data
93
+
94
+ - **Source**: Wikipedia (English)
95
+ - **Processing**: Tokenized with 32K vocabulary SentencePiece tokenizer
96
+ - **Format**: Standard causal language modeling (next token prediction)
97
+
98
+ ## Future Work
99
+
100
+ This model is intended as a base for:
101
+ 1. **Personal Fine-tuning**: Adapt to individual writing style using personal data
102
+ 2. **Domain Adaptation**: Specialize for specific topics or tasks
103
+ 3. **Instruction Tuning**: Add instruction-following capabilities
104
+
105
+ ## Hardware Requirements
106
+
107
+ - **Inference**: ~300MB GPU memory, runs on any modern GPU or Apple Silicon
108
+ - **Fine-tuning**: ~2GB GPU memory recommended
109
+
110
+ ## Related Work
111
+
112
+ Inspired by:
113
+ - Andrej Karpathy's nanoGPT
114
+ - Geddy Duke's small LLM experiments
115
+ - LLaMA architecture design choices
116
+
117
+ ## Citation
118
+
119
+ ```bibtex
120
+ @misc{tiny-llm-54m,
121
+ author = {jonmabe},
122
+ title = {Tiny-LLM: A 54M Parameter Language Model},
123
+ year = {2026},
124
+ publisher = {Hugging Face},
125
+ url = {https://huggingface.co/jonmabe/tiny-llm-54m}
126
+ }
127
+ ```