YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
SEMACS β Semantic Audio Codec System
Requirements
- Python 3.9+
- CUDA 11.8+ (multi-GPU training uses
accelerate)
Install
pip install -r requirements.txt
Note:
torchtuneis required for rotary positional embeddings used inside the codec encoder. Install it separately if the version in PyPI does not match your torch version:pip install torchtune
Data Preparation
Training expects plain text file lists where each line is an audio path:
/path/to/audio1.wav||
/path/to/audio2.wav||
Update config/config.yaml to point to your lists:
data:
train_filelists:
- /path/to/train_filelist.txt
val_filelist: /path/to/val_filelist.txt
Audio is automatically resampled to 16 kHz and segmented into 6-second chunks.
Training
Single GPU
python train.py
Multi-GPU (recommended)
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch --num_processes=8 train.py
Or use the provided script:
bash train.sh
Key Hyperparameters (config/config.yaml)
| Parameter | Default | Description |
|---|---|---|
train.batch_size |
4 | Per-device batch size |
train.num_epochs |
2 | Total training epochs |
train.save_steps |
20000 | Checkpoint interval |
train.output_dir |
./output/codec_ganv5_large1 |
Output directory |
train.lr_scheduler |
cosine_with_restarts |
LR schedule |
train.warmup_ratio |
0.05 | Warm-up fraction |
train.max_grad_norm |
0.5 | Gradient clipping |
Loss Weights
| Loss | Weight | Description |
|---|---|---|
| Mel spectrogram | 15.0 | Multi-resolution mel loss |
| STFT | 1.0 | Multi-resolution STFT loss |
| Semantic reconstruction | 5.0 | KL / feature matching vs. Whisper |
| Speaker reconstruction | 5.0 | Cosine similarity in speaker space |
| Adversarial | 1.0 | GAN generator loss |
| Discriminator | 1.0 | GAN discriminator loss |
Checkpoints
Checkpoints are saved under train.output_dir every save_steps steps. Each checkpoint contains:
model.safetensorsβ generator weightsdiscriminator_mpd.pt/discriminator_mstft.ptβ discriminator weightsconfig.jsonβ model config
Resume training from a checkpoint by setting:
train:
resume_from_checkpoint: ./output/codec_ganv5_large1/checkpoint-XXXXX
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support