File size: 2,045 Bytes
dd24a17 7ab43e4 dd24a17 7ab43e4 dd24a17 7ab43e4 dd24a17 7ab43e4 dd24a17 7ab43e4 dd24a17 7ab43e4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
---
tags:
- gpt
- language-model
- causal-lm
language:
- en
datasets:
- roneneldan/TinyStories
---
# SP-LM-alpha
A GPT model trained on the TinyStories dataset using PyTorch.
## Model Details
- **Model Type**: GPT (Causal Language Model)
- **Vocab Size**: 50257
- **Context Length**: 128
- **Layers**: 6
- **Attention Heads**: 6
- **Embedding Dimension**: 384
- **Training Dataset**: [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)
## Architecture
The model uses a transformer architecture with:
- Token and positional embeddings
- 6 transformer blocks
- Causal self-attention with 6 heads
- Feed-forward networks with GELU activation
- Layer normalization
- Residual connections
## Usage
### Quick Start
```python
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
import json
import torch
from sp_lm import GPT
repo_id = "wizardoftrap/SP-LM-alpha"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
config_dict = json.load(open(hf_hub_download(repo_id=repo_id, filename="config.json")))
config = type('Config', (), config_dict)()
model_weights = load_file(hf_hub_download(repo_id=repo_id, filename="model.safetensors"))
model = GPT(config)
model.load_state_dict(model_weights)
prompt = "Once upon a time"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
generated_ids = model.generate(inputs["input_ids"], max_new_tokens=50, temperature=1.0, top_k=50)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
```
### Installation
1. Download `sp_lm.py` file from this repo for GPT model.
2. Install required packages:
```bash
pip install transformers safetensors huggingface-hub torch
```
3. Load and generate text as shown above
## Training Details
- **Learning Rate**: 1e-4 with linear warmup and cosine annealing decay
- **Batch Size**: 32
- **Gradient Accumulation Steps**: 32
- **Max Iterations**: 20000
- **Optimizer**: AdamW with weight decay
- **Mixed Precision**: bfloat16 / float16 |