Model Details
A newer version of the AuroraStories model series
This model improves story coherence and consistency compared to the earlier (12M model)
License: Apache-2.0
Uses
This model is intended for educational and research purposes.
It is not yet suitable for high-quality or production-level storytelling.
Limitations
- Limited parameter count (18M) restricts overall capability
- May produce repetitive or simplistic outputs
- Struggles with long-range coherence
- Not a general-purpose language model
How to Get Started with the Model
This model is designed to be used with Nanochat.
python -m scripts.base_eval --model-tag Tiny3 --eval sample
This will generate a sample once the model is loaded into Nanochat.
Training Details
Training Data
The TinyStories dataset was used, with 335M tokens randomly selected.
Training Procedure
Training was performed using Nanochat on Kaggle:
1st Cell:
python -m scripts.prepare_tinystories
2nd Cell:
python -m scripts.tok_train --vocab-size=4096
3rd Cell:
WANDB_MODE=disabled torchrun --standalone --nproc_per_node=2 -m scripts.base_train
--run="tinystories_d6"
--depth=6
--device-batch-size=32
--total-batch-size=65536
--max-seq-len=1024
--window-pattern=L
--num-iterations=5100
--eval-every=5000
--eval-tokens=4194304
--sample-every=-1
--save-every=5000
--core-metric-every=-1
--model-tag="Tiny3"
Evaluation
Final Training Metrics
- Train Loss: 1.257
- Train BPB: 1.814
- Validation Loss: 0.312
- Validation BPB: 0.451
- Perplexity: 1.37
Training Curves
Environmental Impact
Hardware Type: GPUs
Hours used: 1.52
Cloud Provider: Kaggle
Carbon Emitted: ~0.13 kg CO₂
Model Architecture
sequence_len: 1024
vocab_size: 4096
n_layer: 6
n_head: 3
n_kv_head: 3
n_embd: 384
Hardware
2× NVIDIA T4 GPUs
Software
Nanochat
