π¦ REX-2: 50M Parameter Story Engine
REX-2 is a 50M parameter decoder-only Transformer language model, trained from scratch for short-form creative storytelling. It demonstrates an end-to-end small-scale LLM pipeline, optimized for limited compute environments.
π Links
π Model Overview
- Architecture: Decoder-only Transformer (GPT-style)
- Parameters: 50.3M
- Layers: 6
- Attention Heads: 8
- Embedding Dimension: 512
- Context Length: 256 tokens
- Tokenizer: GPT-2 (50,257 vocab)
β¨ Intended Use
β Primary Use Cases
- Short story generation
- Educational demonstrations of LLM training
- Lightweight experimentation with Transformer models
- Research on small-scale language models
β οΈ Limitations
- Context drift beyond ~60 tokens
- Inconsistent characters in longer outputs
- Limited factual/world knowledge
- Occasional logical gaps
π« Out-of-Scope Use
- Long-form coherent storytelling
- Factual Q&A or knowledge-based tasks
- Production-grade applications requiring high reliability
π Training Details
Dataset
- TinyStories Dataset
Training Strategy
1. Base Training
- ~3,000 steps
- Learned grammar and structure
2. Stream Polishing
- Streaming-based exposure
- Reduced overfitting
- Improved generalization
3. Fine-Tuning
- Targeted optimization
- Final loss: 2.2 β 2.5
- Improved narrative coherence
βοΈ Training Configuration
- Framework: PyTorch
- Hardware: Tesla T4 (single GPU)
- Optimization: Multi-stage training pipeline
- Tokenizer: GPT-2 tokenizer
π§ͺ Example Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "IqRogueRex/rex-2-50m"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
prompt = "Once upon a time"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_length=100,
temperature=0.8,
top_k=50
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
π§Ύ Sample Output
Once upon a time, a little robot named Rex found a glowing stone in the forest.
When he touched it, the trees began to whisper secrets of the stars...
π Evaluation
- Training Loss: 2.2 β 2.5
- Evaluation is primarily qualitative (story coherence and fluency)
- No benchmark datasets used
π§ Design Philosophy
REX-2 is built to demonstrate that:
- Meaningful LLMs can be trained from scratch at small scale
- Efficient pipelines can compensate for limited compute
- Hands-on model building is key for understanding deep learning systems
β οΈ Bias, Risks, and Safety
- Model is trained on a limited dataset β may reflect dataset biases
- Outputs are not factually reliable
- Should not be used in sensitive or high-stakes applications
π Future Work
- Scale to 100M+ parameters
- Improve long-context handling
- Add instruction tuning
- Expand dataset diversity
- Introduce evaluation benchmarks
π¨βπ» Author
Chinmay V Chatradamath
π License
MIT License