soundsol
/

audio-spatializer

+---
+license: apache-2.0
+tags:
+- audio
+- spatial-audio
+- 3d-audio
+- ambisonics
+- text-to-spatial
+library_name: pytorch
+---
+# Text-Guided Audio Spatializer
+A text-guided spatial audio model that converts mono audio into 3D spatialized binaural audio based on natural language descriptions.
+## Model Description
+This model takes mono audio and text descriptions (e.g., "front-left, level, near, medium room, medium reverb") and generates First-Order Ambisonics (FOA) encoded spatial audio, which can be converted to binaural stereo for headphone listening.
+**Architecture**: Transformer-based model with cross-attention between audio features and text embeddings.
+**Training Data**: Synthetic spatial audio generated using room impulse responses and directional encoding.
+**Sample Rate**: 24kHz
+## Usage
+```python
+import torch
+import soundfile as sf
+from spatializer.models.crossattn_transformer import CrossAttnSpatializer
+# Load model
+model = CrossAttnSpatializer.load_from_checkpoint("epoch=14-step=342.ckpt")
+model.eval()
+# Load audio
+audio, sr = sf.read("input.wav")
+# Spatialize with text
+text = "front-left, level, near, medium room, medium reverb"
+with torch.no_grad():
+    foa_output = model.spatialize(audio, text)
+# Convert FOA to binaural stereo
+from spatializer.utils.foa import foa_to_stereo_simple
+binaural = foa_to_stereo_simple(foa_output)
+# Save output
+sf.write("output_binaural.wav", binaural.T, 24000)
+```
+## Spatial Parameters
+The model understands the following spatial parameters:
+- **Direction**: front, front-left, left, back-left, back, back-right, right, front-right
+- **Elevation**: down, level, up
+- **Distance**: near, mid, far
+- **Room Size**: small, medium, large
+- **Reverb**: dry, medium, wet
+## Limitations
+- Input audio is resampled to 24kHz
+- Best results with mono source material
+- Requires headphones for proper spatial audio experience
+- Model trained on synthetic data, may not capture all acoustic nuances
+## Training Details
+- **Framework**: PyTorch Lightning
+- **Optimizer**: AdamW
+- **Epochs**: 15
+- **Checkpoint**: epoch=14-step=342.ckpt (version 3)
+## Citation
+```
+@misc{helix-spatializer-2025,
+  title={Text-Guided Audio Spatializer},
+  author={Your Name},
+  year={2025}
+}
+```