wizardoftrap
/

SP-LM-alpha

+---
+license: mit
+tags:
+- gpt
+- language-model
+- causal-lm
+language:
+- en
+datasets:
+- roneneldan/TinyStories
+---
+# SP-LM-alpha
+A GPT model trained on the TinyStories dataset using PyTorch.
+## Model Details
+- **Model Type**: GPT (Causal Language Model)
+- **Vocab Size**: 50257
+- **Context Length**: 128
+- **Layers**: 6
+- **Attention Heads**: 6
+- **Embedding Dimension**: 384
+- **Training Dataset**: [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)
+## Architecture
+The model uses a transformer architecture with:
+- Token and positional embeddings
+- 6 transformer blocks
+- Causal self-attention with 6 heads
+- Feed-forward networks with GELU activation
+- Layer normalization
+- Residual connections
+## Usage
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+model_id = "your-username/SP-LM-alpha"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id)
+# Generate text
+prompt = "Once upon a time"
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(**inputs, max_length=100)
+print(tokenizer.decode(outputs[0]))
+```
+## Training Details
+- **Learning Rate**: 1e-4 with linear warmup and cosine annealing decay
+- **Batch Size**: 32
+- **Gradient Accumulation Steps**: 32
+- **Max Iterations**: 20000
+- **Optimizer**: AdamW with weight decay
+- **Mixed Precision**: bfloat16 / float16
+## License
+MIT License
+## Model Card Contact
+For questions or issues, please contact the model author.