File size: 4,928 Bytes
ab7e589 d19d7af ab7e589 d19d7af ab7e589 d19d7af ab7e589 d19d7af ab7e589 d19d7af ab7e589 4135f6d ab7e589 bbc8b14 b779833 ab7e589 bbc8b14 ab7e589 1d747b9 ab7e589 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 |
---
license: mit
language:
- en
tags:
- music
- audio
- commodore-64
- sid
- chiptune
- generative
- gpt2
- transformer
---
# SID-GPT 25M
A GPT model trained to generate Commodore 64 SID music by learning from legendary composers.
**[Listen to samples](#audio-samples)** | **[GitHub](https://github.com/M64GitHub/SidGPT)**
## Model Description
SID-GPT learns to predict SID register states frame-by-frame, essentially learning the "language" of C64 chiptune music. Trained on 2,410 songs from HVSC, it produces output with recognizable musical structures: kick drums, PWM sweeps, basslines, and arpeggios.
| Parameter | Value |
|-----------|-------|
| Parameters | 25.7M |
| Architecture | 8 layers, 8 heads, 512 embedding |
| Block Size | 1020 tokens (20 frames) |
| Effective Context | 12 frames (0.24 sec) |
| Vocabulary | 22 tokens |
| Validation Loss | 0.207 |
| Training Time | 31 hours on M4 MacBook |
## Training Data
- **Source**: [HVSC](https://hvsc.c64.org/) (High Voltage SID Collection)
- **Size**: 1GB of register dump sequences (2,410 SID files)
- **Composers**: DRAX (530 songs), Laxity (287), Rob Hubbard (96), Jeroen Tel (176), Martin Galway (40), and 10 others
## Files
| File | Size | Description |
|------|------|-------------|
| `sid-gpt-xxxx.bin` | 98 MB | Exported weights for Zig inference |
| `sid-gpt-xxxx.pt` | 295 MB | PyTorch checkpoint (includes optimizer state) |
| `config.json` | 1 KB | Model configuration |
## Usage
### Zig Inference Engine (Recommended)
The native Zig engine runs at ~350-120 tok/s with SIMD and KV caching, depending on context window:
```bash
# Clone repository
git clone https://github.com/M64GitHub/SidGPT
cd SidGPT
zig build -Doptimize=ReleaseFast
# Download model
wget https://huggingface.co/M64/sid-gpt-25m/resolve/main/sid-gpt-1700.bin -P models/
# Generate and play
./zig-out/bin/sidgpt --model models/sid-gpt-1700.bin --frames 700 --temp 0.90 --seed 7391738265 --context 12 | ./zig-out/bin/sidgpt-play
# Or export to WAV
./zig-out/bin/sidgpt --model models/sid-gpt-1700.bin --frames 700 --temp 0.90 --seed 7391738265 --context 12 --output music.txt
./zig-out/bin/sidgpt-play music.txt --output-wav music.wav
```
### Python Inference
```bash
cd training
python sample_sid.py --checkpoint path/to/sid-gpt-1700.pt --num_frames 700 --temperature 0.95
```
## Generation Tips
**Good seeds to try**: 1337, 7391738264, 7391738265, 4829173650
## Audio Samples
Generated outputs from this model:
| Sample | Seed | Temp | Description |
|--------|------|------|-------------|
| [test.wav](samples/test.wav) | 7391738265 | 0.95 | Melodic arps with bassline and kicks |
## Proof of Concept Status
Despite only 12 frames (0.24 sec) of context, the model learned real SID techniques:
- **Kick drums** - Pulse wave frequency sweeps transitioning to noise
- **PWM sweeps** - Pulse width modulation fades (Rob Hubbard signature)
- **Basslines** - Melodic bass patterns with movement
- **Arpeggios** - Fast note sequences typical of SID music
- **Leads** - Fading-in lead voices
## Limitations
- **Short context**: 12 frames = no long-range song structure
- **Seed dependent**: Quality varies significantly with random seed
- **No conditioning**: Cannot specify style/artist (planned for v2)
- **Pattern matching**: Learns techniques, not "composing"
## Training Details
```
Loss progression:
Iter 0: 2.88 (random)
Iter 200: 0.96 (structure learned)
Iter 700: 0.37 (musical patterns)
Iter 1000: 0.27 (kick drums, PWM)
Iter 2000: 0.21 (best checkpoint)
```
Training was stopped at iter 2000 when validation loss plateaued and train/val gap exceeded 30% (indicating overfitting).
## Technical Details
### Data Format
Each frame is 25 SID registers encoded as 50 hex characters + newline:
```
B0080005410A306011C0064108200016800D41082000B4031F
B0084005410A30601100074108200016C00D41082000B4031F
...
<end>
```
- 50 frames = 1 second of audio
- Vocabulary: `0-9`, `A-F`, `<`, `>`, `d`, `e`, `n`, `\n` (22 tokens)
### Inference Optimizations
The Zig engine includes:
- **KV Cache**: 50-100x speedup for autoregressive generation
- **SIMD**: @Vector(8, f32) operations, 24x speedup
- **Sliding Window**: Infinite generation beyond context length
## Citation
```bibtex
@misc{sidgpt2026,
author = {Mario Schallner},
title = {SID-GPT: Transformer-based Commodore 64 Music Generation},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/M64/sid-gpt-25m}
}
```
## Links
- [GitHub Repository](https://github.com/M64GitHub/SidGPT)
- [Training Dataset](https://huggingface.co/datasets/M64/sid-music) 1GB training data (2,410 SID files)
- [HVSC - Training Data Source](https://hvsc.c64.org/)
- [Blog Post](#) *(coming soon)*
## Acknowledgments
Thanks to the legendary C64 composers whose work made this possible: Matt Gray, Jeroen Tel, Rob Hubbard, Martin Galway, DRAX, Laxity, and all contributors to HVSC.
|