siren-master / README.md
hilarl's picture
Super-squash branch 'main' using huggingface_hub
c10c3a0 verified
metadata
license: apache-2.0
tags:
  - audio
  - music
  - audio-processing
  - mastering
  - enhancement
  - flow-matching
  - siren
pipeline_tag: audio-to-audio
library_name: pytorch

SIREN-MASTER

Neural Audio Enhancement and Mastering with Flow Matching

SIREN-MASTER is part of the SIREN Audio Suite - a family of neural audio processing models designed for professional music production workflows.

Model Description

SIREN-MASTER enhances and masters audio using a Flow Matching architecture. The model learns the transformation from raw mixes to professionally mastered audio, capturing the nuanced decisions of human mastering engineers.

Key capabilities:

  • Automatic mastering - Professional-quality mastering in one pass
  • Audio enhancement - Improve clarity, punch, and presence
  • Dynamic processing - Intelligent compression and limiting
  • Tonal balance - Optimal frequency distribution
  • Stereo imaging - Enhanced width and depth

Architecture

Component Details
Base Architecture Flow Matching (Continuous Normalizing Flow)
Model Size 40MB
Training Phases 2 (Foundation + Enhancement)
Sample Rate 44.1 kHz

Flow Matching provides:

  • Stable training - More stable than diffusion models
  • Fast inference - Fewer steps than diffusion
  • High fidelity - Excellent audio quality preservation

Training Pipeline

SIREN-MASTER was trained in two phases:

  1. Phase 1: Foundation (100 epochs)

    • Learn basic audio transformations
    • Build robust feature representations
  2. Phase 2: Enhancement (100 epochs)

    • Fine-tune on mastering pairs
    • Learn professional mastering aesthetics

The SIREN Family

SIREN-MASTER is part of a suite of audio AI models:

Model Purpose
SIREN-FX Neural audio effects
SIREN-FIX Audio restoration and repair
SIREN-MASTER Audio enhancement and mastering (this model)
SIREN-STEER Steerable audio transformations
SIREN-SEPARATE Source separation
SIREN-TRANSCRIBE Music analysis (key, tempo, beats)

Usage

import torch
import torchaudio

# Load model
checkpoint = torch.load('siren_master.pt', map_location='cpu')
model_state = checkpoint['model_state_dict']

# Model expects stereo audio at 44.1kHz
# Input: raw mix
# Output: mastered audio

Training Details

  • Training Data: Large-scale mastering dataset (raw/mastered pairs)
  • Training Duration: 200 total epochs (100 Phase 1 + 100 Phase 2)
  • Hardware: NVIDIA B200 GPUs (8-GPU DDP)
  • Batch Size: 256

Intended Use

SIREN-MASTER is designed for:

  • Automatic audio mastering
  • Mix enhancement and polish
  • Reference-quality output preparation
  • Demo/pre-production mastering
  • Research in neural audio enhancement

What SIREN-MASTER Learns

The model captures mastering techniques including:

  • EQ adjustments - Tonal balance and clarity
  • Compression - Dynamic range control
  • Limiting - Loudness maximization
  • Stereo enhancement - Width and imaging
  • Harmonic saturation - Warmth and presence

Limitations

  • Optimized for 44.1kHz sample rate
  • Best results with full mixes (not individual stems)
  • Mastering style reflects training data aesthetics
  • Not a replacement for genre-specific mastering

License

Apache 2.0

Citation

If you use SIREN-MASTER in your research, please cite:

@software{siren_master_2026,
  title={SIREN-MASTER: Neural Audio Mastering with Flow Matching},
  author={SIREN Team},
  year={2026},
  url={https://huggingface.co/hilarl/siren-master}
}

Contact

For questions and feedback, please open an issue on the model repository.