You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

SIREN-MASTER

Neural Audio Enhancement and Mastering with Flow Matching

SIREN-MASTER is part of the SIREN Audio Suite - a family of neural audio processing models designed for professional music production workflows.

Model Description

SIREN-MASTER enhances and masters audio using a Flow Matching architecture. The model learns the transformation from raw mixes to professionally mastered audio, capturing the nuanced decisions of human mastering engineers.

Key capabilities:

  • Automatic mastering - Professional-quality mastering in one pass
  • Audio enhancement - Improve clarity, punch, and presence
  • Dynamic processing - Intelligent compression and limiting
  • Tonal balance - Optimal frequency distribution
  • Stereo imaging - Enhanced width and depth

Architecture

Component Details
Base Architecture Flow Matching (Continuous Normalizing Flow)
Model Size 40MB
Training Phases 2 (Foundation + Enhancement)
Sample Rate 44.1 kHz

Flow Matching provides:

  • Stable training - More stable than diffusion models
  • Fast inference - Fewer steps than diffusion
  • High fidelity - Excellent audio quality preservation

Training Pipeline

SIREN-MASTER was trained in two phases:

  1. Phase 1: Foundation (100 epochs)

    • Learn basic audio transformations
    • Build robust feature representations
  2. Phase 2: Enhancement (100 epochs)

    • Fine-tune on mastering pairs
    • Learn professional mastering aesthetics

The SIREN Family

SIREN-MASTER is part of a suite of audio AI models:

Model Purpose
SIREN-FX Neural audio effects
SIREN-FIX Audio restoration and repair
SIREN-MASTER Audio enhancement and mastering (this model)
SIREN-STEER Steerable audio transformations
SIREN-SEPARATE Source separation
SIREN-TRANSCRIBE Music analysis (key, tempo, beats)

Usage

import torch
import torchaudio

# Load model
checkpoint = torch.load('siren_master.pt', map_location='cpu')
model_state = checkpoint['model_state_dict']

# Model expects stereo audio at 44.1kHz
# Input: raw mix
# Output: mastered audio

Training Details

  • Training Data: Large-scale mastering dataset (raw/mastered pairs)
  • Training Duration: 200 total epochs (100 Phase 1 + 100 Phase 2)
  • Hardware: NVIDIA B200 GPUs (8-GPU DDP)
  • Batch Size: 256

Intended Use

SIREN-MASTER is designed for:

  • Automatic audio mastering
  • Mix enhancement and polish
  • Reference-quality output preparation
  • Demo/pre-production mastering
  • Research in neural audio enhancement

What SIREN-MASTER Learns

The model captures mastering techniques including:

  • EQ adjustments - Tonal balance and clarity
  • Compression - Dynamic range control
  • Limiting - Loudness maximization
  • Stereo enhancement - Width and imaging
  • Harmonic saturation - Warmth and presence

Limitations

  • Optimized for 44.1kHz sample rate
  • Best results with full mixes (not individual stems)
  • Mastering style reflects training data aesthetics
  • Not a replacement for genre-specific mastering

License

Apache 2.0

Citation

If you use SIREN-MASTER in your research, please cite:

@software{siren_master_2026,
  title={SIREN-MASTER: Neural Audio Mastering with Flow Matching},
  author={SIREN Team},
  year={2026},
  url={https://huggingface.co/hilarl/siren-master}
}

Contact

For questions and feedback, please open an issue on the model repository.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support