File size: 1,864 Bytes
2b594db 5f1eb5b 3d84547 2b594db 5f1eb5b 6bbdbcb 5f1eb5b f2757bc 2b594db 5f1eb5b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
---
language:
- en
- it
license: mit
tags:
- text-generation
- educational
- transformer
- pytorch
- safetensors
pipeline_tag: text-generation
datasets:
- roneneldan/TinyStories
---
# MiniTransformer v3
A small educational transformer model trained from scratch for text generation tasks.
## Model Description
MiniTransformer is a compact transformer architecture designed for educational purposes and experimentation. The model is trained on question-answer pairs with various system prompts to demonstrate fundamental transformer capabilities.
**This is an educational model** - it's designed to help understand transformer architectures and training processes, not for production use.
## Architecture
- **Parameters:** 43.9M
- **Architecture:** Decoder-only transformer
- **Embedding Dimension:** 512
- **Attention Heads:** 4
- **Layers:** 4
- **Context Length:** 128 tokens
- **Vocabulary:** BERT tokenizer (30,522 tokens)
## Training Details
### Training Data
- Generic question-answer pairs with diverse system prompts
- Trained using sliding window approach with stride of 32
- Train/test split: 90/10
### Training Procedure
- **Optimizer:** AdamW (fused, learning rate: 3e-4)
- **Batch Size:** 128
- **Epochs:** 50
- **Mixed Precision:** FP16 (AMP enabled)
- **Hardware:** NVIDIA A10 GPU
- **Final Train Loss:** 0.0024
### Framework
- PyTorch 2.0+ with `torch.compile()` optimization
- Transformers library tokenizer
## Usage
```python
import torch
from transformers import AutoTokenizer
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
# Load model (you'll need to download the checkpoint)
# model = MiniTransformer(...)
# model.load_state_dict(torch.load("checkpoint.pt"))
# Generate text
input_text = "Your prompt here"
input_ids = tokenizer.encode(input_text, return_tensors="pt")
# Generation code here |