You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

SIREN-MASTER

Neural Audio Enhancement and Mastering with Flow Matching

SIREN-MASTER is part of the SIREN Audio Suite - a family of neural audio processing models designed for professional music production workflows.

Model Description

SIREN-MASTER enhances and masters audio using a Flow Matching architecture. The model learns the transformation from raw mixes to professionally mastered audio, capturing the nuanced decisions of human mastering engineers.

Key capabilities:

Automatic mastering - Professional-quality mastering in one pass
Audio enhancement - Improve clarity, punch, and presence
Dynamic processing - Intelligent compression and limiting
Tonal balance - Optimal frequency distribution
Stereo imaging - Enhanced width and depth

Architecture

Component	Details
Base Architecture	Flow Matching (Continuous Normalizing Flow)
Model Size	40MB
Training Phases	2 (Foundation + Enhancement)
Sample Rate	44.1 kHz

Flow Matching provides:

Stable training - More stable than diffusion models
Fast inference - Fewer steps than diffusion
High fidelity - Excellent audio quality preservation

Training Pipeline

SIREN-MASTER was trained in two phases:

Phase 1: Foundation (100 epochs)
- Learn basic audio transformations
- Build robust feature representations
Phase 2: Enhancement (100 epochs)
- Fine-tune on mastering pairs
- Learn professional mastering aesthetics

The SIREN Family

SIREN-MASTER is part of a suite of audio AI models:

Model	Purpose
SIREN-FX	Neural audio effects
SIREN-FIX	Audio restoration and repair
SIREN-MASTER	Audio enhancement and mastering (this model)
SIREN-STEER	Steerable audio transformations
SIREN-SEPARATE	Source separation
SIREN-TRANSCRIBE	Music analysis (key, tempo, beats)

Usage

import torch
import torchaudio

# Load model
checkpoint = torch.load('siren_master.pt', map_location='cpu')
model_state = checkpoint['model_state_dict']

# Model expects stereo audio at 44.1kHz
# Input: raw mix
# Output: mastered audio

Training Details

Training Data: Large-scale mastering dataset (raw/mastered pairs)
Training Duration: 200 total epochs (100 Phase 1 + 100 Phase 2)
Hardware: NVIDIA B200 GPUs (8-GPU DDP)
Batch Size: 256

Intended Use

SIREN-MASTER is designed for:

Automatic audio mastering
Mix enhancement and polish
Reference-quality output preparation
Demo/pre-production mastering
Research in neural audio enhancement

What SIREN-MASTER Learns

The model captures mastering techniques including:

EQ adjustments - Tonal balance and clarity
Compression - Dynamic range control
Limiting - Loudness maximization
Stereo enhancement - Width and imaging
Harmonic saturation - Warmth and presence

Limitations

Optimized for 44.1kHz sample rate
Best results with full mixes (not individual stems)
Mastering style reflects training data aesthetics
Not a replacement for genre-specific mastering

License

Apache 2.0

Citation

If you use SIREN-MASTER in your research, please cite:

@software{siren_master_2026,
  title={SIREN-MASTER: Neural Audio Mastering with Flow Matching},
  author={SIREN Team},
  year={2026},
  url={https://huggingface.co/hilarl/siren-master}
}

Contact

For questions and feedback, please open an issue on the model repository.

Downloads last month: -; Downloads are not tracked for this model. How to track