YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

SEMACS β€” Semantic Audio Codec System

Requirements

  • Python 3.9+
  • CUDA 11.8+ (multi-GPU training uses accelerate)

Install

pip install -r requirements.txt

Note: torchtune is required for rotary positional embeddings used inside the codec encoder. Install it separately if the version in PyPI does not match your torch version:

pip install torchtune

Data Preparation

Training expects plain text file lists where each line is an audio path:

/path/to/audio1.wav||
/path/to/audio2.wav||

Update config/config.yaml to point to your lists:

data:
  train_filelists:
    - /path/to/train_filelist.txt
  val_filelist: /path/to/val_filelist.txt

Audio is automatically resampled to 16 kHz and segmented into 6-second chunks.

Training

Single GPU

python train.py

Multi-GPU (recommended)

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch --num_processes=8 train.py

Or use the provided script:

bash train.sh

Key Hyperparameters (config/config.yaml)

Parameter Default Description
train.batch_size 4 Per-device batch size
train.num_epochs 2 Total training epochs
train.save_steps 20000 Checkpoint interval
train.output_dir ./output/codec_ganv5_large1 Output directory
train.lr_scheduler cosine_with_restarts LR schedule
train.warmup_ratio 0.05 Warm-up fraction
train.max_grad_norm 0.5 Gradient clipping

Loss Weights

Loss Weight Description
Mel spectrogram 15.0 Multi-resolution mel loss
STFT 1.0 Multi-resolution STFT loss
Semantic reconstruction 5.0 KL / feature matching vs. Whisper
Speaker reconstruction 5.0 Cosine similarity in speaker space
Adversarial 1.0 GAN generator loss
Discriminator 1.0 GAN discriminator loss

Checkpoints

Checkpoints are saved under train.output_dir every save_steps steps. Each checkpoint contains:

  • model.safetensors β€” generator weights
  • discriminator_mpd.pt / discriminator_mstft.pt β€” discriminator weights
  • config.json β€” model config

Resume training from a checkpoint by setting:

train:
  resume_from_checkpoint: ./output/codec_ganv5_large1/checkpoint-XXXXX
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support