Stories-SLM π€
This model is a part of a collection of Small Language Models pretrained from scratch on the Tiny Stories Dataset. The collection contains 2 pretrained models (at this moment), more on the way. The model variants in the collection ranges from standard GPT to Mixture-Of-Experts versions built with RoPE, Group Query Attention, and RMSNormalization.
Model Name: Stories-SLM
Model Description
Stories-SLM is a small language model pretrained from scratch on the Tiny Stories Dataset. It has 53 million parameters and is trained for 10,000 steps on a single Tesla T4 GPU. It is trained on the next token prediction task using Cross-Entropy Loss over 674M tokens.
- Developed by: Namrata Thakur
- Model type: Text Generation
- Language(s) (NLP): English
- License: MIT
- Training Type: Pretraining
Model Sources
- Repository: GitHub Repo
- Demo [optional]: [More Information Needed]
How to Get Started with the Model
To install Stories-SLM, follow these steps:
# Clone the repository
git clone https://github.com/NamrataThakur/Large_Language_Model_From_Scratch_Implementation.git
#Create an environment:
python -m venv env
# Install the required packages
pip install -r requirements.txt
Uses
Stories-SLM can be used to generate small, grammatically and semantically coherent simple short stories suitable for children.
Chainlit Interface π₯οΈ
The easiest way to interact with Stories-SLM is through its Chainlit interface:
chainlit run app_pretrain.py
This will launch a web application where you can input text and see the model's generated responses.
Downloading from Huggingface π€
To interact with the model by downloading from huggingface:
- First clone the repo in the local
from transformer_blocks.gpt2 import GPT2
from gpt_Pretraining.text_generation import Text_Generation
model = GPT2.from_pretrained("NamrataThakur/Small_Language_Model_MHA_53M_Pretrained")
model.eval()
#---------------------------- Checking the generation to make everything is okay ---------------------------
generation = Text_Generation(model=model, device='cpu', tokenizer_model='gpt2',
arch_type='original')
start_context = "One day, a "
response = generation.text_generation(input_text=start_context, max_new_tokens = 160, temp = 0.5, top_k=10, kv_cache=False)
print(response)
Model Architecture and Objective
Stories-SLM uses a standard GPT decoder-only transformer architecture with:
- Attention Type: Multi Head Attention
- Normalization: LayerNormalization
- Position Embedding: Learned absolute position encoding (similar to GPT2)
- Num transformer blocks: 8
- Num attention heads: 8
- Embedding dimensions: 384
- Vocabulary size: 50,257 tokens
- Context window: 256 tokens
- Feed-Forward Hidden Dimension: 1536
- Parameters: ~53M (52.88M exact)
- Overall Dropout: 0.2
Optimization Config:
- Optimizer: AdamW
- Weight Decay: 0.1
- Beta1: 0.9
- Beta2: 0.95
- Warmup Steps: 829 steps
- Total Steps: 10,000
- use_gradient_clip: True
- Initial Learning Rate: 1e-05
- Maximum Learning Rate: 0.0008
- Gradient Accumulation Steps: 16
- Batch Size: 16
- Global Batch Size: 256
- Scheduler: Linear Increase, followed by Cosine Annealing
Training Details
Training Data
The model was trained on the TinyStories dataset, a collection of short stories designed for training language models. This dataset provides simple narratives that help the model learn coherent story generation while maintaining a smaller size compared to larger language models.
Training Procedure
Stories-SLM was trained using PyTorch on the TinyStories dataset. The training process involved:
- Tokenizing the input text
- Creating sliding windows of fixed block size
- Training the model with cross-entropy loss
- Applying learning rate scheduling with warmup and cosine decay
Training Plots
- Learning Rate Vs Steps:
- Loss Vs Steps:
Inference
During inference, Stories-SLM uses several techniques to produce high-quality text:
- Temperature scaling for controlling randomness
- Top-k sampling for focus and diversity
- Efficient token generation one at a time
- Max New Tokens to determine generation length
Results
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: Single Tesla-T4 16GB
- Hours used: [More Information Needed]
- Cloud Provider: Lightning-AI
License
This project is licensed under the MIT License - see the LICENSE file for details.
Support β€οΈ
If you find Stories-SLM useful, please consider starring the repository β
- Downloads last month
- 124



