Model Details

A newer version of the AuroraStories model series
This model improves story coherence and consistency compared to the earlier (12M model)

License: Apache-2.0

Uses

This model is intended for educational and research purposes.
It is not yet suitable for high-quality or production-level storytelling.

Limitations

  • Limited parameter count (18M) restricts overall capability
  • May produce repetitive or simplistic outputs
  • Struggles with long-range coherence
  • Not a general-purpose language model

How to Get Started with the Model

This model is designed to be used with Nanochat.

python -m scripts.base_eval --model-tag Tiny3 --eval sample

This will generate a sample once the model is loaded into Nanochat.

Training Details

Training Data

The TinyStories dataset was used, with 335M tokens randomly selected.

Training Procedure

Training was performed using Nanochat on Kaggle:

1st Cell:

python -m scripts.prepare_tinystories

2nd Cell:

python -m scripts.tok_train --vocab-size=4096

3rd Cell:

WANDB_MODE=disabled torchrun --standalone --nproc_per_node=2 -m scripts.base_train
--run="tinystories_d6"
--depth=6
--device-batch-size=32
--total-batch-size=65536
--max-seq-len=1024
--window-pattern=L
--num-iterations=5100
--eval-every=5000
--eval-tokens=4194304
--sample-every=-1
--save-every=5000
--core-metric-every=-1
--model-tag="Tiny3"

Evaluation

Final Training Metrics

  • Train Loss: 1.257
  • Train BPB: 1.814
  • Validation Loss: 0.312
  • Validation BPB: 0.451
  • Perplexity: 1.37

Training Curves

Training Charts

Environmental Impact

Hardware Type: GPUs

Hours used: 1.52

Cloud Provider: Kaggle

Carbon Emitted: ~0.13 kg CO₂

Model Architecture

sequence_len: 1024

vocab_size: 4096

n_layer: 6

n_head: 3

n_kv_head: 3

n_embd: 384

Hardware

2× NVIDIA T4 GPUs

Software

Nanochat

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train ThatHungarian/AuroraStories-18M