You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

SIREN-STEER

Steerable Neural Audio Effects with Temporal Convolutional Networks

SIREN-STEER is part of the SIREN Audio Suite - a family of neural audio processing models designed for professional music production workflows.

Model Description

SIREN-STEER provides controllable audio effect transformations using a Temporal Convolutional Network (TCN) architecture. The model learns to apply effects in a steerable manner, allowing fine-grained control over effect intensity and character.

Key capabilities:

  • Steerable intensity - Control effect strength from 0% to 100%
  • Style transfer - Transfer sonic characteristics between tracks
  • Effect modeling - Learn any audio effect transformation
  • Real-time capable - Efficient architecture for low-latency processing

Architecture

Component Details
Base Architecture TCN (Temporal Convolutional Network)
Receptive Field Exponentially growing with depth
Parameters ~33K (lightweight)
Sample Rate 44.1 kHz
Latency < 5ms

The TCN architecture provides:

  • Causal convolutions - No future information leakage
  • Dilated convolutions - Large receptive field with few parameters
  • Residual connections - Stable gradient flow

Supported Effects

SIREN-STEER was trained on the IDMT-SMT-AUDIO-EFFECTS dataset covering:

  • Distortion / Overdrive
  • Reverb
  • Delay / Echo
  • Chorus
  • Flanger
  • Phaser
  • Tremolo
  • Vibrato
  • Compression
  • EQ

The SIREN Family

SIREN-STEER is part of a suite of audio AI models:

Model Purpose
SIREN-FX Neural audio effects
SIREN-FIX Audio restoration and repair
SIREN-MASTER Audio enhancement and mastering
SIREN-STEER Steerable audio transformations (this model)
SIREN-SEPARATE Source separation
SIREN-TRANSCRIBE Music analysis (key, tempo, beats)

Usage

import torch
import torchaudio

# Load model
checkpoint = torch.load('siren_steer.pt', map_location='cpu')

# Model expects mono audio at 44.1kHz
# Input shape: (batch, samples)
# Conditioning: effect intensity 0.0-1.0

Training Details

  • Training Data: IDMT-SMT-AUDIO-EFFECTS (23,352 effect pairs)
  • Training Duration: 200 epochs
  • Hardware: NVIDIA B200 GPU
  • Final Validation Loss: 1.81

Intended Use

SIREN-STEER is designed for:

  • Controllable audio effect application
  • Effect intensity interpolation
  • Style transfer between audio tracks
  • Real-time audio processing plugins
  • Research in steerable audio transformations

Lightweight Design

At only ~33K parameters, SIREN-STEER is specifically designed for:

  • Edge deployment - Runs on CPUs and mobile devices
  • Plugin integration - Minimal memory footprint
  • Real-time processing - Sub-5ms latency
  • Batch processing - Handle many tracks simultaneously

Limitations

  • Optimized for 44.1kHz sample rate
  • Best results with mono input
  • Effect quality depends on training distribution

License

Apache 2.0

Citation

If you use SIREN-STEER in your research, please cite:

@software{siren_steer_2026,
  title={SIREN-STEER: Steerable Neural Audio Effects},
  author={SIREN Team},
  year={2026},
  url={https://huggingface.co/hilarl/siren-steer}
}

Contact

For questions and feedback, please open an issue on the model repository.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support