RonMicro-LLM-Story (Phase 1)

A small GPT-2 style language model trained on TinyStories dataset for generating children's stories.

Model Details

Model Type: GPT-2 Causal Language Model
Parameters: ~40M
Training Data: TinyStories (5% subset, ~105K stories)
Vocabulary Size: 25,913 tokens
Context Length: 512 tokens
Training Epochs: 3
Language: English

Training Details

Framework: Transformers (Hugging Face)
Tokenizer: Custom BPE trained on TinyStories
Architecture:
- 6 transformer layers
- 384 embedding dimensions
- 6 attention heads
- 1536 FFN dimensions

Usage

from transformers import pipeline

# Load the model
generator = pipeline("text-generation", model="endurasolution/ronmicro-llm-story")

# Generate a story
story = generator(
    "Once upon a time",
    max_new_tokens=150,
    temperature=0.7,
    repetition_penalty=1.3,
    no_repeat_ngram_size=3,
    do_sample=True
)

print(story[0]["generated_text"])

Example Outputs

Prompt: "Once upon a time" Output: "Once upon a time, there was a little boy named Timmy. He loved to play with his toy cars and trucks all day long..."

Limitations

Trained on only 5% of TinyStories (Phase 1)
May generate repetitive text occasionally
Best for short children's stories (100-200 words)
Limited to simple vocabulary and grammar

Next Steps

Phase 2 training in progress with 20% data and 5 epochs for improved quality.

Citation

Built using TinyStories dataset:

@article{eldan2023tinystories,
  title={TinyStories: How Small Can Language Models Be and Still Speak Coherent English?},
  author={Eldan, Ronen and Li, Yuanzhi},
  journal={arXiv preprint arXiv:2305.07759},
  year={2023}
}

License

MIT License

Downloads last month: 2

Safetensors

Model size

20.8M params

Tensor type

F32

Dataset used to train endurasolution/ronmicro-llm-story

Paper for endurasolution/ronmicro-llm-story

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Paper • 2305.07759 • Published May 12, 2023 • 45