|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- text-generation |
|
|
- gpt2 |
|
|
- stories |
|
|
- children-stories |
|
|
- tinystories |
|
|
datasets: |
|
|
- roneneldan/TinyStories |
|
|
widget: |
|
|
- text: "Once upon a time" |
|
|
example_title: "Story Beginning" |
|
|
- text: "The little girl loved to" |
|
|
example_title: "Character Story" |
|
|
- text: "In a magical forest," |
|
|
example_title: "Fantasy Setting" |
|
|
--- |
|
|
|
|
|
# RonMicro-LLM-Story (Phase 1) |
|
|
|
|
|
A small GPT-2 style language model trained on TinyStories dataset for generating children's stories. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Model Type:** GPT-2 Causal Language Model |
|
|
- **Parameters:** ~40M |
|
|
- **Training Data:** TinyStories (5% subset, ~105K stories) |
|
|
- **Vocabulary Size:** 25,913 tokens |
|
|
- **Context Length:** 512 tokens |
|
|
- **Training Epochs:** 3 |
|
|
- **Language:** English |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Framework:** Transformers (Hugging Face) |
|
|
- **Tokenizer:** Custom BPE trained on TinyStories |
|
|
- **Architecture:** |
|
|
- 6 transformer layers |
|
|
- 384 embedding dimensions |
|
|
- 6 attention heads |
|
|
- 1536 FFN dimensions |
|
|
|
|
|
## Usage |
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
# Load the model |
|
|
generator = pipeline("text-generation", model="endurasolution/ronmicro-llm-story") |
|
|
|
|
|
# Generate a story |
|
|
story = generator( |
|
|
"Once upon a time", |
|
|
max_new_tokens=150, |
|
|
temperature=0.7, |
|
|
repetition_penalty=1.3, |
|
|
no_repeat_ngram_size=3, |
|
|
do_sample=True |
|
|
) |
|
|
|
|
|
print(story[0]["generated_text"]) |
|
|
``` |
|
|
|
|
|
## Example Outputs |
|
|
|
|
|
**Prompt:** "Once upon a time" |
|
|
**Output:** "Once upon a time, there was a little boy named Timmy. He loved to play with his toy cars and trucks all day long..." |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Trained on only 5% of TinyStories (Phase 1) |
|
|
- May generate repetitive text occasionally |
|
|
- Best for short children's stories (100-200 words) |
|
|
- Limited to simple vocabulary and grammar |
|
|
|
|
|
## Next Steps |
|
|
|
|
|
Phase 2 training in progress with 20% data and 5 epochs for improved quality. |
|
|
|
|
|
## Citation |
|
|
|
|
|
Built using TinyStories dataset: |
|
|
``` |
|
|
@article{eldan2023tinystories, |
|
|
title={TinyStories: How Small Can Language Models Be and Still Speak Coherent English?}, |
|
|
author={Eldan, Ronen and Li, Yuanzhi}, |
|
|
journal={arXiv preprint arXiv:2305.07759}, |
|
|
year={2023} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
MIT License |
|
|
|