SIREN-STEER
Steerable Neural Audio Effects with Temporal Convolutional Networks
SIREN-STEER is part of the SIREN Audio Suite - a family of neural audio processing models designed for professional music production workflows.
Model Description
SIREN-STEER provides controllable audio effect transformations using a Temporal Convolutional Network (TCN) architecture. The model learns to apply effects in a steerable manner, allowing fine-grained control over effect intensity and character.
Key capabilities:
- Steerable intensity - Control effect strength from 0% to 100%
- Style transfer - Transfer sonic characteristics between tracks
- Effect modeling - Learn any audio effect transformation
- Real-time capable - Efficient architecture for low-latency processing
Architecture
| Component | Details |
|---|---|
| Base Architecture | TCN (Temporal Convolutional Network) |
| Receptive Field | Exponentially growing with depth |
| Parameters | ~33K (lightweight) |
| Sample Rate | 44.1 kHz |
| Latency | < 5ms |
The TCN architecture provides:
- Causal convolutions - No future information leakage
- Dilated convolutions - Large receptive field with few parameters
- Residual connections - Stable gradient flow
Supported Effects
SIREN-STEER was trained on the IDMT-SMT-AUDIO-EFFECTS dataset covering:
- Distortion / Overdrive
- Reverb
- Delay / Echo
- Chorus
- Flanger
- Phaser
- Tremolo
- Vibrato
- Compression
- EQ
The SIREN Family
SIREN-STEER is part of a suite of audio AI models:
| Model | Purpose |
|---|---|
| SIREN-FX | Neural audio effects |
| SIREN-FIX | Audio restoration and repair |
| SIREN-MASTER | Audio enhancement and mastering |
| SIREN-STEER | Steerable audio transformations (this model) |
| SIREN-SEPARATE | Source separation |
| SIREN-TRANSCRIBE | Music analysis (key, tempo, beats) |
Usage
import torch
import torchaudio
# Load model
checkpoint = torch.load('siren_steer.pt', map_location='cpu')
# Model expects mono audio at 44.1kHz
# Input shape: (batch, samples)
# Conditioning: effect intensity 0.0-1.0
Training Details
- Training Data: IDMT-SMT-AUDIO-EFFECTS (23,352 effect pairs)
- Training Duration: 200 epochs
- Hardware: NVIDIA B200 GPU
- Final Validation Loss: 1.81
Intended Use
SIREN-STEER is designed for:
- Controllable audio effect application
- Effect intensity interpolation
- Style transfer between audio tracks
- Real-time audio processing plugins
- Research in steerable audio transformations
Lightweight Design
At only ~33K parameters, SIREN-STEER is specifically designed for:
- Edge deployment - Runs on CPUs and mobile devices
- Plugin integration - Minimal memory footprint
- Real-time processing - Sub-5ms latency
- Batch processing - Handle many tracks simultaneously
Limitations
- Optimized for 44.1kHz sample rate
- Best results with mono input
- Effect quality depends on training distribution
License
Apache 2.0
Citation
If you use SIREN-STEER in your research, please cite:
@software{siren_steer_2026,
title={SIREN-STEER: Steerable Neural Audio Effects},
author={SIREN Team},
year={2026},
url={https://huggingface.co/hilarl/siren-steer}
}
Contact
For questions and feedback, please open an issue on the model repository.