|
|
--- |
|
|
tags: |
|
|
- gpt |
|
|
- language-model |
|
|
- causal-lm |
|
|
language: |
|
|
- en |
|
|
datasets: |
|
|
- roneneldan/TinyStories |
|
|
--- |
|
|
|
|
|
# SP-LM-alpha |
|
|
|
|
|
A GPT model trained on the TinyStories dataset using PyTorch. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Model Type**: GPT (Causal Language Model) |
|
|
- **Vocab Size**: 50257 |
|
|
- **Context Length**: 128 |
|
|
- **Layers**: 6 |
|
|
- **Attention Heads**: 6 |
|
|
- **Embedding Dimension**: 384 |
|
|
- **Training Dataset**: [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) |
|
|
|
|
|
## Architecture |
|
|
|
|
|
The model uses a transformer architecture with: |
|
|
- Token and positional embeddings |
|
|
- 6 transformer blocks |
|
|
- Causal self-attention with 6 heads |
|
|
- Feed-forward networks with GELU activation |
|
|
- Layer normalization |
|
|
- Residual connections |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Quick Start |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer |
|
|
from huggingface_hub import hf_hub_download |
|
|
from safetensors.torch import load_file |
|
|
import json |
|
|
import torch |
|
|
from sp_lm import GPT |
|
|
|
|
|
repo_id = "wizardoftrap/SP-LM-alpha" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(repo_id) |
|
|
|
|
|
config_dict = json.load(open(hf_hub_download(repo_id=repo_id, filename="config.json"))) |
|
|
config = type('Config', (), config_dict)() |
|
|
|
|
|
model_weights = load_file(hf_hub_download(repo_id=repo_id, filename="model.safetensors")) |
|
|
model = GPT(config) |
|
|
model.load_state_dict(model_weights) |
|
|
|
|
|
prompt = "Once upon a time" |
|
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
|
with torch.no_grad(): |
|
|
generated_ids = model.generate(inputs["input_ids"], max_new_tokens=50, temperature=1.0, top_k=50) |
|
|
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
### Installation |
|
|
|
|
|
1. Download `sp_lm.py` file from this repo for GPT model. |
|
|
|
|
|
2. Install required packages: |
|
|
```bash |
|
|
pip install transformers safetensors huggingface-hub torch |
|
|
``` |
|
|
|
|
|
3. Load and generate text as shown above |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Learning Rate**: 1e-4 with linear warmup and cosine annealing decay |
|
|
- **Batch Size**: 32 |
|
|
- **Gradient Accumulation Steps**: 32 |
|
|
- **Max Iterations**: 20000 |
|
|
- **Optimizer**: AdamW with weight decay |
|
|
- **Mixed Precision**: bfloat16 / float16 |