--- tags: - gpt - language-model - causal-lm language: - en datasets: - roneneldan/TinyStories --- # SP-LM-alpha A GPT model trained on the TinyStories dataset using PyTorch. ## Model Details - **Model Type**: GPT (Causal Language Model) - **Vocab Size**: 50257 - **Context Length**: 128 - **Layers**: 6 - **Attention Heads**: 6 - **Embedding Dimension**: 384 - **Training Dataset**: [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) ## Architecture The model uses a transformer architecture with: - Token and positional embeddings - 6 transformer blocks - Causal self-attention with 6 heads - Feed-forward networks with GELU activation - Layer normalization - Residual connections ## Usage ### Quick Start ```python from transformers import AutoTokenizer from huggingface_hub import hf_hub_download from safetensors.torch import load_file import json import torch from sp_lm import GPT repo_id = "wizardoftrap/SP-LM-alpha" tokenizer = AutoTokenizer.from_pretrained(repo_id) config_dict = json.load(open(hf_hub_download(repo_id=repo_id, filename="config.json"))) config = type('Config', (), config_dict)() model_weights = load_file(hf_hub_download(repo_id=repo_id, filename="model.safetensors")) model = GPT(config) model.load_state_dict(model_weights) prompt = "Once upon a time" inputs = tokenizer(prompt, return_tensors="pt") with torch.no_grad():     generated_ids = model.generate(inputs["input_ids"], max_new_tokens=50, temperature=1.0, top_k=50) print(tokenizer.decode(generated_ids[0], skip_special_tokens=True)) ``` ### Installation 1. Download `sp_lm.py` file from this repo for GPT model. 2. Install required packages: ```bash pip install transformers safetensors huggingface-hub torch ``` 3. Load and generate text as shown above ## Training Details - **Learning Rate**: 1e-4 with linear warmup and cosine annealing decay - **Batch Size**: 32 - **Gradient Accumulation Steps**: 32 - **Max Iterations**: 20000 - **Optimizer**: AdamW with weight decay - **Mixed Precision**: bfloat16 / float16