| --- |
| license: mit |
| tags: |
| - audio |
| - audio-super-resolution |
| - upscaling |
| - audiosr |
| - safetensors |
| - maestraea |
| pipeline_tag: audio-to-audio |
| --- |
| |
| # AudioSR Models (Safetensors) |
|
|
| **Audio Super-Resolution — Upscale Any Audio to 48kHz** |
|
|
| [Original Source](https://github.com/haoheliu/versatile_audio_super_resolution) by [Haohe Liu](https://github.com/haoheliu) · MIT License |
|
|
| > Converted from `pytorch_model.bin` to safetensors format for faster loading and safer deserialization. For use with [Mæstræa AI Workstation](https://github.com/AEmotionStudio/Maestraea). |
| |
| ## Available Models |
| |
| | Variant | Files | Size | Description | |
| |---------|-------|------|-------------| |
| | **basic** | `basic/audiosr_basic.safetensors` | 6.2 GB | General audio (music, SFX, speech) | |
| | **speech** | `speech/audiosr_speech-*.safetensors` (3 shards) | 6.2 GB | Optimized for spoken word | |
|
|
| ## What AudioSR Does |
|
|
| AudioSR uses latent diffusion to upscale any audio to 48kHz, restoring high-frequency content that was lost to: |
|
|
| - Low sample rate recording (8kHz, 16kHz, 22kHz → 48kHz) |
| - Lossy compression (MP3, AAC artifacts) |
| - Bandwidth-limited audio |
|
|
| ### Key Parameters |
|
|
| | Parameter | Range | Default | Description | |
| |-----------|-------|---------|-------------| |
| | `ddim_steps` | 10–200 | 50 | More steps = higher quality | |
| | `guidance_scale` | 1–10 | 3.5 | Prompt adherence | |
| | `model_name` | basic/speech | basic | Which variant to use | |
|
|
| ### VRAM Requirements |
|
|
| - **Minimum**: ~4 GB |
| - **Recommended**: ~6 GB (for longer audio) |
|
|
| ## Usage with Mæstræa |
|
|
| These models are automatically downloaded by the Mæstræa AI Workstation backend. |
|
|
| ### Direct Usage |
|
|
| ```python |
| import audiosr |
| model = audiosr.build_model(model_name="basic") |
| waveform = audiosr.super_resolution( |
| model, "input.wav", |
| seed=42, guidance_scale=3.5, ddim_steps=50 |
| ) |
| ``` |
|
|
| ## Original Source |
|
|
| | Variant | Original Repo | |
| |---------|--------------| |
| | basic | [haoheliu/audiosr_basic](https://huggingface.co/haoheliu/audiosr_basic) | |
| | speech | [haoheliu/audiosr_speech](https://huggingface.co/haoheliu/audiosr_speech) | |
|
|
| ## License |
|
|
| MIT — same as the original AudioSR release. |
|
|
| ## Credits |
|
|
| - **Model**: [AudioSR](https://github.com/haoheliu/versatile_audio_super_resolution) by Haohe Liu et al. |
| - **Paper**: [Versatile Audio Super Resolution](https://arxiv.org/abs/2309.07314) |
| - **Conversion & Mirror by**: [AEmotionStudio](https://huggingface.co/AEmotionStudio) |
|
|