|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- music |
|
|
- audio |
|
|
- commodore-64 |
|
|
- sid |
|
|
- chiptune |
|
|
- generative |
|
|
- gpt2 |
|
|
- transformer |
|
|
--- |
|
|
|
|
|
# SID-GPT 25M |
|
|
|
|
|
A GPT model trained to generate Commodore 64 SID music by learning from legendary composers. |
|
|
|
|
|
**[Listen to samples](#audio-samples)** | **[GitHub](https://github.com/M64GitHub/SidGPT)** |
|
|
|
|
|
## Model Description |
|
|
|
|
|
SID-GPT learns to predict SID register states frame-by-frame, essentially learning the "language" of C64 chiptune music. Trained on 2,410 songs from HVSC, it produces output with recognizable musical structures: kick drums, PWM sweeps, basslines, and arpeggios. |
|
|
|
|
|
| Parameter | Value | |
|
|
|-----------|-------| |
|
|
| Parameters | 25.7M | |
|
|
| Architecture | 8 layers, 8 heads, 512 embedding | |
|
|
| Block Size | 1020 tokens (20 frames) | |
|
|
| Effective Context | 12 frames (0.24 sec) | |
|
|
| Vocabulary | 22 tokens | |
|
|
| Validation Loss | 0.207 | |
|
|
| Training Time | 31 hours on M4 MacBook | |
|
|
|
|
|
## Training Data |
|
|
|
|
|
- **Source**: [HVSC](https://hvsc.c64.org/) (High Voltage SID Collection) |
|
|
- **Size**: 1GB of register dump sequences (2,410 SID files) |
|
|
- **Composers**: DRAX (530 songs), Laxity (287), Rob Hubbard (96), Jeroen Tel (176), Martin Galway (40), and 10 others |
|
|
|
|
|
## Files |
|
|
|
|
|
| File | Size | Description | |
|
|
|------|------|-------------| |
|
|
| `sid-gpt-xxxx.bin` | 98 MB | Exported weights for Zig inference | |
|
|
| `sid-gpt-xxxx.pt` | 295 MB | PyTorch checkpoint (includes optimizer state) | |
|
|
| `config.json` | 1 KB | Model configuration | |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Zig Inference Engine (Recommended) |
|
|
|
|
|
The native Zig engine runs at ~350-120 tok/s with SIMD and KV caching, depending on context window: |
|
|
```bash |
|
|
# Clone repository |
|
|
git clone https://github.com/M64GitHub/SidGPT |
|
|
cd SidGPT |
|
|
zig build -Doptimize=ReleaseFast |
|
|
|
|
|
# Download model |
|
|
wget https://huggingface.co/M64/sid-gpt-25m/resolve/main/sid-gpt-1700.bin -P models/ |
|
|
|
|
|
# Generate and play |
|
|
./zig-out/bin/sidgpt --model models/sid-gpt-1700.bin --frames 700 --temp 0.90 --seed 7391738265 --context 12 | ./zig-out/bin/sidgpt-play |
|
|
|
|
|
# Or export to WAV |
|
|
./zig-out/bin/sidgpt --model models/sid-gpt-1700.bin --frames 700 --temp 0.90 --seed 7391738265 --context 12 --output music.txt |
|
|
./zig-out/bin/sidgpt-play music.txt --output-wav music.wav |
|
|
``` |
|
|
|
|
|
### Python Inference |
|
|
```bash |
|
|
cd training |
|
|
python sample_sid.py --checkpoint path/to/sid-gpt-1700.pt --num_frames 700 --temperature 0.95 |
|
|
``` |
|
|
|
|
|
## Generation Tips |
|
|
|
|
|
**Good seeds to try**: 1337, 7391738264, 7391738265, 4829173650 |
|
|
|
|
|
## Audio Samples |
|
|
|
|
|
Generated outputs from this model: |
|
|
|
|
|
| Sample | Seed | Temp | Description | |
|
|
|--------|------|------|-------------| |
|
|
| [test.wav](samples/test.wav) | 7391738265 | 0.95 | Melodic arps with bassline and kicks | |
|
|
|
|
|
## Proof of Concept Status |
|
|
|
|
|
Despite only 12 frames (0.24 sec) of context, the model learned real SID techniques: |
|
|
|
|
|
- **Kick drums** - Pulse wave frequency sweeps transitioning to noise |
|
|
- **PWM sweeps** - Pulse width modulation fades (Rob Hubbard signature) |
|
|
- **Basslines** - Melodic bass patterns with movement |
|
|
- **Arpeggios** - Fast note sequences typical of SID music |
|
|
- **Leads** - Fading-in lead voices |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- **Short context**: 12 frames = no long-range song structure |
|
|
- **Seed dependent**: Quality varies significantly with random seed |
|
|
- **No conditioning**: Cannot specify style/artist (planned for v2) |
|
|
- **Pattern matching**: Learns techniques, not "composing" |
|
|
|
|
|
## Training Details |
|
|
``` |
|
|
Loss progression: |
|
|
Iter 0: 2.88 (random) |
|
|
Iter 200: 0.96 (structure learned) |
|
|
Iter 700: 0.37 (musical patterns) |
|
|
Iter 1000: 0.27 (kick drums, PWM) |
|
|
Iter 2000: 0.21 (best checkpoint) |
|
|
``` |
|
|
|
|
|
Training was stopped at iter 2000 when validation loss plateaued and train/val gap exceeded 30% (indicating overfitting). |
|
|
|
|
|
## Technical Details |
|
|
|
|
|
### Data Format |
|
|
|
|
|
Each frame is 25 SID registers encoded as 50 hex characters + newline: |
|
|
``` |
|
|
B0080005410A306011C0064108200016800D41082000B4031F |
|
|
B0084005410A30601100074108200016C00D41082000B4031F |
|
|
... |
|
|
<end> |
|
|
``` |
|
|
|
|
|
- 50 frames = 1 second of audio |
|
|
- Vocabulary: `0-9`, `A-F`, `<`, `>`, `d`, `e`, `n`, `\n` (22 tokens) |
|
|
|
|
|
### Inference Optimizations |
|
|
|
|
|
The Zig engine includes: |
|
|
- **KV Cache**: 50-100x speedup for autoregressive generation |
|
|
- **SIMD**: @Vector(8, f32) operations, 24x speedup |
|
|
- **Sliding Window**: Infinite generation beyond context length |
|
|
|
|
|
## Citation |
|
|
```bibtex |
|
|
@misc{sidgpt2026, |
|
|
author = {Mario Schallner}, |
|
|
title = {SID-GPT: Transformer-based Commodore 64 Music Generation}, |
|
|
year = {2026}, |
|
|
publisher = {Hugging Face}, |
|
|
url = {https://huggingface.co/M64/sid-gpt-25m} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Links |
|
|
|
|
|
- [GitHub Repository](https://github.com/M64GitHub/SidGPT) |
|
|
- [Training Dataset](https://huggingface.co/datasets/M64/sid-music) 1GB training data (2,410 SID files) |
|
|
- [HVSC - Training Data Source](https://hvsc.c64.org/) |
|
|
- [Blog Post](#) *(coming soon)* |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
Thanks to the legendary C64 composers whose work made this possible: Matt Gray, Jeroen Tel, Rob Hubbard, Martin Galway, DRAX, Laxity, and all contributors to HVSC. |
|
|
|