SIREN-MASTER
Neural Audio Enhancement and Mastering with Flow Matching
SIREN-MASTER is part of the SIREN Audio Suite - a family of neural audio processing models designed for professional music production workflows.
Model Description
SIREN-MASTER enhances and masters audio using a Flow Matching architecture. The model learns the transformation from raw mixes to professionally mastered audio, capturing the nuanced decisions of human mastering engineers.
Key capabilities:
- Automatic mastering - Professional-quality mastering in one pass
- Audio enhancement - Improve clarity, punch, and presence
- Dynamic processing - Intelligent compression and limiting
- Tonal balance - Optimal frequency distribution
- Stereo imaging - Enhanced width and depth
Architecture
| Component | Details |
|---|---|
| Base Architecture | Flow Matching (Continuous Normalizing Flow) |
| Model Size | 40MB |
| Training Phases | 2 (Foundation + Enhancement) |
| Sample Rate | 44.1 kHz |
Flow Matching provides:
- Stable training - More stable than diffusion models
- Fast inference - Fewer steps than diffusion
- High fidelity - Excellent audio quality preservation
Training Pipeline
SIREN-MASTER was trained in two phases:
Phase 1: Foundation (100 epochs)
- Learn basic audio transformations
- Build robust feature representations
Phase 2: Enhancement (100 epochs)
- Fine-tune on mastering pairs
- Learn professional mastering aesthetics
The SIREN Family
SIREN-MASTER is part of a suite of audio AI models:
| Model | Purpose |
|---|---|
| SIREN-FX | Neural audio effects |
| SIREN-FIX | Audio restoration and repair |
| SIREN-MASTER | Audio enhancement and mastering (this model) |
| SIREN-STEER | Steerable audio transformations |
| SIREN-SEPARATE | Source separation |
| SIREN-TRANSCRIBE | Music analysis (key, tempo, beats) |
Usage
import torch
import torchaudio
# Load model
checkpoint = torch.load('siren_master.pt', map_location='cpu')
model_state = checkpoint['model_state_dict']
# Model expects stereo audio at 44.1kHz
# Input: raw mix
# Output: mastered audio
Training Details
- Training Data: Large-scale mastering dataset (raw/mastered pairs)
- Training Duration: 200 total epochs (100 Phase 1 + 100 Phase 2)
- Hardware: NVIDIA B200 GPUs (8-GPU DDP)
- Batch Size: 256
Intended Use
SIREN-MASTER is designed for:
- Automatic audio mastering
- Mix enhancement and polish
- Reference-quality output preparation
- Demo/pre-production mastering
- Research in neural audio enhancement
What SIREN-MASTER Learns
The model captures mastering techniques including:
- EQ adjustments - Tonal balance and clarity
- Compression - Dynamic range control
- Limiting - Loudness maximization
- Stereo enhancement - Width and imaging
- Harmonic saturation - Warmth and presence
Limitations
- Optimized for 44.1kHz sample rate
- Best results with full mixes (not individual stems)
- Mastering style reflects training data aesthetics
- Not a replacement for genre-specific mastering
License
Apache 2.0
Citation
If you use SIREN-MASTER in your research, please cite:
@software{siren_master_2026,
title={SIREN-MASTER: Neural Audio Mastering with Flow Matching},
author={SIREN Team},
year={2026},
url={https://huggingface.co/hilarl/siren-master}
}
Contact
For questions and feedback, please open an issue on the model repository.