SIREN-FX
Neural Audio Effects Processing with State Space Models
SIREN-FX is part of the SIREN Audio Suite - a family of neural audio processing models designed for professional music production workflows.
Model Description
SIREN-FX learns to model and apply complex audio effects using a State Space Model (S4) architecture. Unlike traditional neural audio effects that are limited to simple nonlinearities, SIREN-FX can capture time-dependent effects with theoretically unlimited receptive fields, making it ideal for:
- Reverb - Room acoustics and spatial effects
- Delay - Time-based echo and repetition
- Chorus/Flanger/Phaser - Modulation effects
- Complex effect chains - Multi-effect processing
Architecture
| Component | Details |
|---|---|
| Base Architecture | S4 (Structured State Space Sequence Model) |
| Model Dimension | 128 |
| Number of Layers | 8 |
| State Size | 64 |
| Parameters | ~1.5M |
| Sample Rate | 44.1 kHz |
The S4 architecture provides:
- Unlimited receptive field - Can model any length-dependent effect
- Linear time complexity - Efficient inference
- Stable training - Proper initialization for long sequences
The SIREN Family
SIREN-FX is part of a suite of audio AI models:
| Model | Purpose |
|---|---|
| SIREN-FX | Neural audio effects (this model) |
| SIREN-FIX | Audio restoration and repair |
| SIREN-MASTER | Audio enhancement and mastering |
| SIREN-STEER | Steerable audio transformations |
| SIREN-SEPARATE | Source separation |
| SIREN-TRANSCRIBE | Music analysis (key, tempo, beats) |
Usage
import torch
import torchaudio
# Load model
checkpoint = torch.load('siren_fx.pt', map_location='cpu')
model_state = checkpoint['model_state_dict']
# Model expects mono audio at 44.1kHz
# Input shape: (batch, 1, samples)
# Output shape: (batch, 1, samples)
Training Details
- Training Data: Large-scale audio effects dataset
- Training Duration: 200 epochs
- Hardware: NVIDIA B200 GPUs
- Final Validation Loss: 1.0482
Intended Use
SIREN-FX is designed for:
- Music production and post-production
- Audio effect modeling and emulation
- Creative sound design
- Research in neural audio processing
Limitations
- Optimized for 44.1kHz sample rate
- Best results with mono input
- Effects learned from training distribution
License
Apache 2.0
Citation
If you use SIREN-FX in your research, please cite:
@software{siren_fx_2026,
title={SIREN-FX: Neural Audio Effects with State Space Models},
author={SIREN Team},
year={2026},
url={https://huggingface.co/hilarl/siren-fx}
}
Contact
For questions and feedback, please open an issue on the model repository.