Model Details

A newer version of the AuroraStories model series
This model improves story coherence and consistency compared to the earlier (12M model)

License: Apache-2.0

Uses

This model is intended for educational and research purposes.
It is not yet suitable for high-quality or production-level storytelling.

Limitations

Limited parameter count (18M) restricts overall capability
May produce repetitive or simplistic outputs
Struggles with long-range coherence
Not a general-purpose language model

How to Get Started with the Model

This model is designed to be used with Nanochat.

python -m scripts.base_eval --model-tag Tiny3 --eval sample

This will generate a sample once the model is loaded into Nanochat.

Training Details

Training Data

The TinyStories dataset was used, with 335M tokens randomly selected.

Training Procedure

Training was performed using Nanochat on Kaggle:

1st Cell:

python -m scripts.prepare_tinystories

2nd Cell:

python -m scripts.tok_train --vocab-size=4096

3rd Cell:

WANDB_MODE=disabled torchrun --standalone --nproc_per_node=2 -m scripts.base_train
--run="tinystories_d6"
--depth=6
--device-batch-size=32
--total-batch-size=65536
--max-seq-len=1024
--window-pattern=L
--num-iterations=5100
--eval-every=5000
--eval-tokens=4194304
--sample-every=-1
--save-every=5000
--core-metric-every=-1
--model-tag="Tiny3"

Evaluation

Final Training Metrics

Train Loss: 1.257
Train BPB: 1.814
Validation Loss: 0.312
Validation BPB: 0.451
Perplexity: 1.37

Training Curves

Environmental Impact

Hardware Type: GPUs

Hours used: 1.52

Cloud Provider: Kaggle

Carbon Emitted: ~0.13 kg CO₂

Model Architecture

sequence_len: 1024

vocab_size: 4096

n_layer: 6

n_head: 3

n_kv_head: 3

n_embd: 384

Hardware

2× NVIDIA T4 GPUs

Software

Nanochat

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

ThatHungarian
/

AuroraStories-18M