sid-gpt-25m / README.md

Upload folder using huggingface_hub

d19d7af verified 29 days ago

4.93 kB

	---
	license: mit
	language:
	- en
	tags:
	- music
	- audio
	- commodore-64
	- sid
	- chiptune
	- generative
	- gpt2
	- transformer
	---

	# SID-GPT 25M

	A GPT model trained to generate Commodore 64 SID music by learning from legendary composers.

	[Listen to samples](#audio-samples) \| [GitHub](https://github.com/M64GitHub/SidGPT)

	## Model Description

	SID-GPT learns to predict SID register states frame-by-frame, essentially learning the "language" of C64 chiptune music. Trained on 2,410 songs from HVSC, it produces output with recognizable musical structures: kick drums, PWM sweeps, basslines, and arpeggios.

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Parameters \| 25.7M \|
	\| Architecture \| 8 layers, 8 heads, 512 embedding \|
	\| Block Size \| 1020 tokens (20 frames) \|
	\| Effective Context \| 12 frames (0.24 sec) \|
	\| Vocabulary \| 22 tokens \|
	\| Validation Loss \| 0.207 \|
	\| Training Time \| 31 hours on M4 MacBook \|

	## Training Data

	- Source: [HVSC](https://hvsc.c64.org/) (High Voltage SID Collection)
	- Size: 1GB of register dump sequences (2,410 SID files)
	- Composers: DRAX (530 songs), Laxity (287), Rob Hubbard (96), Jeroen Tel (176), Martin Galway (40), and 10 others

	## Files

	\| File \| Size \| Description \|
	\|------\|------\|-------------\|
	\| `sid-gpt-xxxx.bin` \| 98 MB \| Exported weights for Zig inference \|
	\| `sid-gpt-xxxx.pt` \| 295 MB \| PyTorch checkpoint (includes optimizer state) \|
	\| `config.json` \| 1 KB \| Model configuration \|

	## Usage

	### Zig Inference Engine (Recommended)

	The native Zig engine runs at ~350-120 tok/s with SIMD and KV caching, depending on context window:
	```bash
	# Clone repository
	git clone https://github.com/M64GitHub/SidGPT
	cd SidGPT
	zig build -Doptimize=ReleaseFast

	# Download model
	wget https://huggingface.co/M64/sid-gpt-25m/resolve/main/sid-gpt-1700.bin -P models/

	# Generate and play
	./zig-out/bin/sidgpt --model models/sid-gpt-1700.bin --frames 700 --temp 0.90 --seed 7391738265 --context 12 \| ./zig-out/bin/sidgpt-play

	# Or export to WAV
	./zig-out/bin/sidgpt --model models/sid-gpt-1700.bin --frames 700 --temp 0.90 --seed 7391738265 --context 12 --output music.txt
	./zig-out/bin/sidgpt-play music.txt --output-wav music.wav
	```

	### Python Inference
	```bash
	cd training
	python sample_sid.py --checkpoint path/to/sid-gpt-1700.pt --num_frames 700 --temperature 0.95
	```

	## Generation Tips

	Good seeds to try: 1337, 7391738264, 7391738265, 4829173650

	## Audio Samples

	Generated outputs from this model:

	\| Sample \| Seed \| Temp \| Description \|
	\|--------\|------\|------\|-------------\|
	\| [test.wav](samples/test.wav) \| 7391738265 \| 0.95 \| Melodic arps with bassline and kicks \|

	## Proof of Concept Status

	Despite only 12 frames (0.24 sec) of context, the model learned real SID techniques:

	- Kick drums - Pulse wave frequency sweeps transitioning to noise
	- PWM sweeps - Pulse width modulation fades (Rob Hubbard signature)
	- Basslines - Melodic bass patterns with movement
	- Arpeggios - Fast note sequences typical of SID music
	- Leads - Fading-in lead voices

	## Limitations

	- Short context: 12 frames = no long-range song structure
	- Seed dependent: Quality varies significantly with random seed
	- No conditioning: Cannot specify style/artist (planned for v2)
	- Pattern matching: Learns techniques, not "composing"

	## Training Details
	```
	Loss progression:
	Iter 0: 2.88 (random)
	Iter 200: 0.96 (structure learned)
	Iter 700: 0.37 (musical patterns)
	Iter 1000: 0.27 (kick drums, PWM)
	Iter 2000: 0.21 (best checkpoint)
	```

	Training was stopped at iter 2000 when validation loss plateaued and train/val gap exceeded 30% (indicating overfitting).

	## Technical Details

	### Data Format

	Each frame is 25 SID registers encoded as 50 hex characters + newline:
	```
	B0080005410A306011C0064108200016800D41082000B4031F
	B0084005410A30601100074108200016C00D41082000B4031F
	...
	<end>
	```

	- 50 frames = 1 second of audio
	- Vocabulary: `0-9`, `A-F`, `<`, `>`, `d`, `e`, `n`, `\n` (22 tokens)

	### Inference Optimizations

	The Zig engine includes:
	- KV Cache: 50-100x speedup for autoregressive generation
	- SIMD: @Vector(8, f32) operations, 24x speedup
	- Sliding Window: Infinite generation beyond context length

	## Citation
	```bibtex
	@misc{sidgpt2026,
	author = {Mario Schallner},
	title = {SID-GPT: Transformer-based Commodore 64 Music Generation},
	year = {2026},
	publisher = {Hugging Face},
	url = {https://huggingface.co/M64/sid-gpt-25m}
	}
	```

	## Links

	- [GitHub Repository](https://github.com/M64GitHub/SidGPT)
	- [Training Dataset](https://huggingface.co/datasets/M64/sid-music) 1GB training data (2,410 SID files)
	- [HVSC - Training Data Source](https://hvsc.c64.org/)
	- [Blog Post](#) (coming soon)

	## Acknowledgments

	Thanks to the legendary C64 composers whose work made this possible: Matt Gray, Jeroen Tel, Rob Hubbard, Martin Galway, DRAX, Laxity, and all contributors to HVSC.