πŸ¦– REX-2: 50M Parameter Story Engine

REX-2 is a 50M parameter decoder-only Transformer language model, trained from scratch for short-form creative storytelling. It demonstrates an end-to-end small-scale LLM pipeline, optimized for limited compute environments.


πŸ”— Links

πŸ€— Hugging Face GitHub πŸš€ Streamlit


πŸ” Model Overview

  • Architecture: Decoder-only Transformer (GPT-style)
  • Parameters: 50.3M
  • Layers: 6
  • Attention Heads: 8
  • Embedding Dimension: 512
  • Context Length: 256 tokens
  • Tokenizer: GPT-2 (50,257 vocab)

✨ Intended Use

βœ… Primary Use Cases

  • Short story generation
  • Educational demonstrations of LLM training
  • Lightweight experimentation with Transformer models
  • Research on small-scale language models

⚠️ Limitations

  • Context drift beyond ~60 tokens
  • Inconsistent characters in longer outputs
  • Limited factual/world knowledge
  • Occasional logical gaps

🚫 Out-of-Scope Use

  • Long-form coherent storytelling
  • Factual Q&A or knowledge-based tasks
  • Production-grade applications requiring high reliability

πŸ“Š Training Details

Dataset

  • TinyStories Dataset

Training Strategy

1. Base Training

  • ~3,000 steps
  • Learned grammar and structure

2. Stream Polishing

  • Streaming-based exposure
  • Reduced overfitting
  • Improved generalization

3. Fine-Tuning

  • Targeted optimization
  • Final loss: 2.2 – 2.5
  • Improved narrative coherence

βš™οΈ Training Configuration

  • Framework: PyTorch
  • Hardware: Tesla T4 (single GPU)
  • Optimization: Multi-stage training pipeline
  • Tokenizer: GPT-2 tokenizer

πŸ§ͺ Example Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "IqRogueRex/rex-2-50m"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = "Once upon a time"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_length=100,
    temperature=0.8,
    top_k=50
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🧾 Sample Output

Once upon a time, a little robot named Rex found a glowing stone in the forest.
When he touched it, the trees began to whisper secrets of the stars...

πŸ“ˆ Evaluation

  • Training Loss: 2.2 – 2.5
  • Evaluation is primarily qualitative (story coherence and fluency)
  • No benchmark datasets used

🧠 Design Philosophy

REX-2 is built to demonstrate that:

  • Meaningful LLMs can be trained from scratch at small scale
  • Efficient pipelines can compensate for limited compute
  • Hands-on model building is key for understanding deep learning systems

⚠️ Bias, Risks, and Safety

  • Model is trained on a limited dataset β†’ may reflect dataset biases
  • Outputs are not factually reliable
  • Should not be used in sensitive or high-stakes applications

πŸš€ Future Work

  • Scale to 100M+ parameters
  • Improve long-context handling
  • Add instruction tuning
  • Expand dataset diversity
  • Introduce evaluation benchmarks

πŸ‘¨β€πŸ’» Author

Chinmay V Chatradamath


πŸ“œ License

MIT License

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support