olmostories-8m / README.md
doabell's picture
Update README.md
a13afd2 verified
metadata
library_name: transformers
datasets:
  - roneneldan/TinyStories

olmostories-8m

TinyStories trained on OLMo 2 architecture.

Took around 2.5 hours on an A100 (80GB).

config = Olmo2Config(
    vocab_size=5000,
    hidden_size=288,
    intermediate_size=720,
    num_hidden_layers=6,
    num_attention_heads=6, 
    num_key_value_heads=6,
    max_position_embeddings=256,
    initializer_range=0.02,
    attention_dropout=0.1,
)