SEMACS — Semantic Audio Codec System

Requirements

Python 3.9+
CUDA 11.8+ (multi-GPU training uses accelerate)

Install

pip install -r requirements.txt

Note: torchtune is required for rotary positional embeddings used inside the codec encoder. Install it separately if the version in PyPI does not match your torch version:
pip install torchtune

Data Preparation

Training expects plain text file lists where each line is an audio path:

/path/to/audio1.wav||
/path/to/audio2.wav||

Update config/config.yaml to point to your lists:

data:
  train_filelists:
    - /path/to/train_filelist.txt
  val_filelist: /path/to/val_filelist.txt

Audio is automatically resampled to 16 kHz and segmented into 6-second chunks.

Training

Single GPU

python train.py

Multi-GPU (recommended)

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch --num_processes=8 train.py

Or use the provided script:

bash train.sh

Key Hyperparameters (`config/config.yaml`)

Parameter	Default	Description
`train.batch_size`	4	Per-device batch size
`train.num_epochs`	2	Total training epochs
`train.save_steps`	20000	Checkpoint interval
`train.output_dir`	`./output/codec_ganv5_large1`	Output directory
`train.lr_scheduler`	`cosine_with_restarts`	LR schedule
`train.warmup_ratio`	0.05	Warm-up fraction
`train.max_grad_norm`	0.5	Gradient clipping

Loss Weights

Loss	Weight	Description
Mel spectrogram	15.0	Multi-resolution mel loss
STFT	1.0	Multi-resolution STFT loss
Semantic reconstruction	5.0	KL / feature matching vs. Whisper
Speaker reconstruction	5.0	Cosine similarity in speaker space
Adversarial	1.0	GAN generator loss
Discriminator	1.0	GAN discriminator loss

Checkpoints

Checkpoints are saved under train.output_dir every save_steps steps. Each checkpoint contains:

model.safetensors — generator weights
discriminator_mpd.pt / discriminator_mstft.pt — discriminator weights
config.json — model config

Resume training from a checkpoint by setting:

train:
  resume_from_checkpoint: ./output/codec_ganv5_large1/checkpoint-XXXXX

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support