--- license: apache-2.0 tags: - audio - spatial-audio - 3d-audio - ambisonics - text-to-spatial library_name: pytorch --- # Text-Guided Audio Spatializer A text-guided spatial audio model that converts mono audio into 3D spatialized binaural audio based on natural language descriptions. ## Model Description This model takes mono audio and text descriptions (e.g., "front-left, level, near, medium room, medium reverb") and generates First-Order Ambisonics (FOA) encoded spatial audio, which can be converted to binaural stereo for headphone listening. **Architecture**: Transformer-based model with cross-attention between audio features and text embeddings. **Training Data**: Synthetic spatial audio generated using room impulse responses and directional encoding. **Sample Rate**: 24kHz ## Usage ```python import torch import soundfile as sf from spatializer.models.crossattn_transformer import CrossAttnSpatializer # Load model model = CrossAttnSpatializer.load_from_checkpoint("epoch=14-step=342.ckpt") model.eval() # Load audio audio, sr = sf.read("input.wav") # Spatialize with text text = "front-left, level, near, medium room, medium reverb" with torch.no_grad(): foa_output = model.spatialize(audio, text) # Convert FOA to binaural stereo from spatializer.utils.foa import foa_to_stereo_simple binaural = foa_to_stereo_simple(foa_output) # Save output sf.write("output_binaural.wav", binaural.T, 24000) ``` ## Spatial Parameters The model understands the following spatial parameters: - **Direction**: front, front-left, left, back-left, back, back-right, right, front-right - **Elevation**: down, level, up - **Distance**: near, mid, far - **Room Size**: small, medium, large - **Reverb**: dry, medium, wet ## Limitations - Input audio is resampled to 24kHz - Best results with mono source material - Requires headphones for proper spatial audio experience - Model trained on synthetic data, may not capture all acoustic nuances ## Training Details - **Framework**: PyTorch Lightning - **Optimizer**: AdamW - **Epochs**: 15 - **Checkpoint**: epoch=14-step=342.ckpt (version 3) ## Citation ``` @misc{helix-spatializer-2025, title={Text-Guided Audio Spatializer}, author={Your Name}, year={2025} } ```