🦖 REX-2: 50M Parameter Story Engine

REX-2 is a 50M parameter decoder-only Transformer language model, trained from scratch for short-form creative storytelling. It demonstrates an end-to-end small-scale LLM pipeline, optimized for limited compute environments.

🔗 Links

🔍 Model Overview

Architecture: Decoder-only Transformer (GPT-style)
Parameters: 50.3M
Layers: 6
Attention Heads: 8
Embedding Dimension: 512
Context Length: 256 tokens
Tokenizer: GPT-2 (50,257 vocab)

✨ Intended Use

✅ Primary Use Cases

Short story generation
Educational demonstrations of LLM training
Lightweight experimentation with Transformer models
Research on small-scale language models

⚠️ Limitations

Context drift beyond ~60 tokens
Inconsistent characters in longer outputs
Limited factual/world knowledge
Occasional logical gaps

🚫 Out-of-Scope Use

Long-form coherent storytelling
Factual Q&A or knowledge-based tasks
Production-grade applications requiring high reliability

📊 Training Details

Dataset

TinyStories Dataset

Training Strategy

1. Base Training

~3,000 steps
Learned grammar and structure

2. Stream Polishing

Streaming-based exposure
Reduced overfitting
Improved generalization

3. Fine-Tuning

Targeted optimization
Final loss: 2.2 – 2.5
Improved narrative coherence

⚙️ Training Configuration

Framework: PyTorch
Hardware: Tesla T4 (single GPU)
Optimization: Multi-stage training pipeline
Tokenizer: GPT-2 tokenizer

🧪 Example Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "IqRogueRex/rex-2-50m"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = "Once upon a time"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_length=100,
    temperature=0.8,
    top_k=50
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🧾 Sample Output

Once upon a time, a little robot named Rex found a glowing stone in the forest.
When he touched it, the trees began to whisper secrets of the stars...

📈 Evaluation

Training Loss: 2.2 – 2.5
Evaluation is primarily qualitative (story coherence and fluency)
No benchmark datasets used

🧠 Design Philosophy

REX-2 is built to demonstrate that:

Meaningful LLMs can be trained from scratch at small scale
Efficient pipelines can compensate for limited compute
Hands-on model building is key for understanding deep learning systems

⚠️ Bias, Risks, and Safety

Model is trained on a limited dataset → may reflect dataset biases
Outputs are not factually reliable
Should not be used in sensitive or high-stakes applications

🚀 Future Work

Scale to 100M+ parameters
Improve long-context handling
Add instruction tuning
Expand dataset diversity
Introduce evaluation benchmarks

👨‍💻 Author

Chinmay V Chatradamath

📜 License

MIT License

Downloads last month: -; Downloads are not tracked for this model. How to track