ronmicro-llm-story / README.md
endurasolution's picture
Upload README.md with huggingface_hub
c491095 verified
metadata
license: mit
language:
  - en
tags:
  - text-generation
  - gpt2
  - stories
  - children-stories
  - tinystories
datasets:
  - roneneldan/TinyStories
widget:
  - text: Once upon a time
    example_title: Story Beginning
  - text: The little girl loved to
    example_title: Character Story
  - text: In a magical forest,
    example_title: Fantasy Setting

RonMicro-LLM-Story (Phase 1)

A small GPT-2 style language model trained on TinyStories dataset for generating children's stories.

Model Details

  • Model Type: GPT-2 Causal Language Model
  • Parameters: ~40M
  • Training Data: TinyStories (5% subset, ~105K stories)
  • Vocabulary Size: 25,913 tokens
  • Context Length: 512 tokens
  • Training Epochs: 3
  • Language: English

Training Details

  • Framework: Transformers (Hugging Face)
  • Tokenizer: Custom BPE trained on TinyStories
  • Architecture:
    • 6 transformer layers
    • 384 embedding dimensions
    • 6 attention heads
    • 1536 FFN dimensions

Usage

from transformers import pipeline

# Load the model
generator = pipeline("text-generation", model="endurasolution/ronmicro-llm-story")

# Generate a story
story = generator(
    "Once upon a time",
    max_new_tokens=150,
    temperature=0.7,
    repetition_penalty=1.3,
    no_repeat_ngram_size=3,
    do_sample=True
)

print(story[0]["generated_text"])

Example Outputs

Prompt: "Once upon a time" Output: "Once upon a time, there was a little boy named Timmy. He loved to play with his toy cars and trucks all day long..."

Limitations

  • Trained on only 5% of TinyStories (Phase 1)
  • May generate repetitive text occasionally
  • Best for short children's stories (100-200 words)
  • Limited to simple vocabulary and grammar

Next Steps

Phase 2 training in progress with 20% data and 5 epochs for improved quality.

Citation

Built using TinyStories dataset:

@article{eldan2023tinystories,
  title={TinyStories: How Small Can Language Models Be and Still Speak Coherent English?},
  author={Eldan, Ronen and Li, Yuanzhi},
  journal={arXiv preprint arXiv:2305.07759},
  year={2023}
}

License

MIT License