Super-squash branch 'main' using huggingface_hub

c10c3a0 verified 3 months ago

3.74 kB

license: apache-2.0
tags:
  - audio
  - music
  - audio-processing
  - mastering
  - enhancement
  - flow-matching
  - siren
pipeline_tag: audio-to-audio
library_name: pytorch

SIREN-MASTER

Neural Audio Enhancement and Mastering with Flow Matching

SIREN-MASTER is part of the SIREN Audio Suite - a family of neural audio processing models designed for professional music production workflows.

Model Description

SIREN-MASTER enhances and masters audio using a Flow Matching architecture. The model learns the transformation from raw mixes to professionally mastered audio, capturing the nuanced decisions of human mastering engineers.

Key capabilities:

Automatic mastering - Professional-quality mastering in one pass
Audio enhancement - Improve clarity, punch, and presence
Dynamic processing - Intelligent compression and limiting
Tonal balance - Optimal frequency distribution
Stereo imaging - Enhanced width and depth

Architecture

Component	Details
Base Architecture	Flow Matching (Continuous Normalizing Flow)
Model Size	40MB
Training Phases	2 (Foundation + Enhancement)
Sample Rate	44.1 kHz

Flow Matching provides:

Stable training - More stable than diffusion models
Fast inference - Fewer steps than diffusion
High fidelity - Excellent audio quality preservation

Training Pipeline

SIREN-MASTER was trained in two phases:

Phase 1: Foundation (100 epochs)
- Learn basic audio transformations
- Build robust feature representations
Phase 2: Enhancement (100 epochs)
- Fine-tune on mastering pairs
- Learn professional mastering aesthetics

The SIREN Family

SIREN-MASTER is part of a suite of audio AI models:

Model	Purpose
SIREN-FX	Neural audio effects
SIREN-FIX	Audio restoration and repair
SIREN-MASTER	Audio enhancement and mastering (this model)
SIREN-STEER	Steerable audio transformations
SIREN-SEPARATE	Source separation
SIREN-TRANSCRIBE	Music analysis (key, tempo, beats)

Usage

import torch
import torchaudio

# Load model
checkpoint = torch.load('siren_master.pt', map_location='cpu')
model_state = checkpoint['model_state_dict']

# Model expects stereo audio at 44.1kHz
# Input: raw mix
# Output: mastered audio

Training Details

Training Data: Large-scale mastering dataset (raw/mastered pairs)
Training Duration: 200 total epochs (100 Phase 1 + 100 Phase 2)
Hardware: NVIDIA B200 GPUs (8-GPU DDP)
Batch Size: 256

Intended Use

SIREN-MASTER is designed for:

Automatic audio mastering
Mix enhancement and polish
Reference-quality output preparation
Demo/pre-production mastering
Research in neural audio enhancement

What SIREN-MASTER Learns

The model captures mastering techniques including:

EQ adjustments - Tonal balance and clarity
Compression - Dynamic range control
Limiting - Loudness maximization
Stereo enhancement - Width and imaging
Harmonic saturation - Warmth and presence

Limitations

Optimized for 44.1kHz sample rate
Best results with full mixes (not individual stems)
Mastering style reflects training data aesthetics
Not a replacement for genre-specific mastering

License

Apache 2.0

Citation

If you use SIREN-MASTER in your research, please cite:

@software{siren_master_2026,
  title={SIREN-MASTER: Neural Audio Mastering with Flow Matching},
  author={SIREN Team},
  year={2026},
  url={https://huggingface.co/hilarl/siren-master}
}

Contact

For questions and feedback, please open an issue on the model repository.