File size: 4,001 Bytes
87879d6 825fe90 a6534e0 825fe90 87879d6 825fe90 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
---
language:
- en
license: mit
tags:
- llm
- decoder-only
- transformer
- from-scratch
- research
- educational
- 80m
- pytorch
- pretraining
- custom-architecture
pipeline_tag: text-generation
inference:
parameters:
temperature: 0.7
top_p: 0.95
---
# π§ Mini-LLM β 80M Parameter Transformer (Pretrained From Scratch)
[]()
[]()
**Mini-LLM** is an 80M parameter decoder-only transformer trained **fully from scratch** using a custom tokenizer, custom architecture, and custom training loop.
It is designed as an educational + research-friendly minimal LLM that demonstrates how modern LLM components are built end-to-end.
---
## β¨ Key Features
- **80M parameters** β compact but fully functional LLM
- **Trained from scratch** (no borrowed checkpoints)
- Custom **Byte-Level BPE tokenizer (32k vocab)**
- Modern architecture components:
- RoPE (Rotary Position Embeddings)
- RMSNorm
- SwiGLU FeedForward layer
- FlashAttention (via PyTorch SDPA)
- GQA-ready Attention implementation
- **2B tokens** mixed corpus (FineWeb + WikiText + Wikipedia)
- Training logs, checkpoints, plots all included for transparency
- Released under a permissive license for research & learning
---
## π Model Architecture
| Component | Value |
|----------|-------|
| Type | Decoder-only transformer |
| Parameters | ~80M |
| Layers | 16 |
| Embedding dim | 384 |
| Attention heads | 6 |
| KV Heads | 6 |
| MLP Hidden Dim | 1536 (SwiGLU) |
| Max sequence length | 2048 |
| Norm | RMSNorm |
| Positional Encoding | RoPE |
| Tokenizer | SentencePiece BPE (32k vocab, byte fallback) |
---
## π¦ Files in This Repo
- `checkpoints/` β Pretrained model state_dict + optimizer
- `safetensors/` β Final consolidated .safetensors file
- `logs/` β Training logs in JSONL
- `plots/` β Train/val loss curves
- `tokenizer.json` β HF-compatible tokenizer
- `spm.model` β SentencePiece model
---
## π§ͺ Quick Usage (HF Transformers)
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Ashx098/Mini-LLM", trust_remote_code=True)
tok = AutoTokenizer.from_pretrained("Ashx098/Mini-LLM")
prompt = "Hello, how are you?"
inputs = tok(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tok.decode(outputs[0], skip_special_tokens=True))
```
## π Training Details
### Optimizer
- **AdamW** (Ξ²1=0.9, Ξ²2=0.95, weight decay=0.1)
- **Learning rate**: 6e-4 (cosine annealing + warmup)
### Batch β¨ Sequence
- **Global batch size** = 32
- **Sequence length** = 2048
- **Gradient accumulation** = 8
### Hardware
- Trained on 1Γ NVIDIA A100 80GB
## π Training Curve
<p align="center"> <img src="https://huggingface.co/Ashx098/Mini-LLM/resolve/main/phase-1-pretraining/plots/loss_curve.png" width="500"> </p>
Final loss reached: ~3.25
## π¬ Example Outputs
**Prompt**: "Hello, how are you"
**Output**: "Hello, how are you?"
**Prompt**: "Python is a programming language that"
**Output**: "Python is a programming language that allows the history..."
## β οΈ Limitations
- Small model β limited reasoning, hallucination likely
- Not instruction-tuned
- Not suitable for production usage
- Best viewed as a learning + research artifact
## π License
MIT License β free for research, modification, and further training.
## π Credits
Developed by **Avinash Mynampati**
Built from scratch using PyTorch + custom training pipeline.
### Want to fine-tune or extend it?
You can:
- Train further with your own dataset
- Add LoRA adapters
- Use it to learn attention, RoPE, SwiGLU, etc.
- Build a tiny instruction-tuned version (coming soon!)
## π¬ Contact
For questions or collaborations:
- **GitHub**: [Ashx098](https://github.com/Ashx098)
- **LinkedIn**: [Avinash Mynampati](https://linkedin.com/in/avinash-mynampati)
|