audiosr-models / README.md
AEmotionStudio's picture
Add README
ca3ea8d verified
---
license: mit
tags:
- audio
- audio-super-resolution
- upscaling
- audiosr
- safetensors
- maestraea
pipeline_tag: audio-to-audio
---
# AudioSR Models (Safetensors)
**Audio Super-Resolution — Upscale Any Audio to 48kHz**
[Original Source](https://github.com/haoheliu/versatile_audio_super_resolution) by [Haohe Liu](https://github.com/haoheliu) · MIT License
> Converted from `pytorch_model.bin` to safetensors format for faster loading and safer deserialization. For use with [Mæstræa AI Workstation](https://github.com/AEmotionStudio/Maestraea).
## Available Models
| Variant | Files | Size | Description |
|---------|-------|------|-------------|
| **basic** | `basic/audiosr_basic.safetensors` | 6.2 GB | General audio (music, SFX, speech) |
| **speech** | `speech/audiosr_speech-*.safetensors` (3 shards) | 6.2 GB | Optimized for spoken word |
## What AudioSR Does
AudioSR uses latent diffusion to upscale any audio to 48kHz, restoring high-frequency content that was lost to:
- Low sample rate recording (8kHz, 16kHz, 22kHz → 48kHz)
- Lossy compression (MP3, AAC artifacts)
- Bandwidth-limited audio
### Key Parameters
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| `ddim_steps` | 10–200 | 50 | More steps = higher quality |
| `guidance_scale` | 1–10 | 3.5 | Prompt adherence |
| `model_name` | basic/speech | basic | Which variant to use |
### VRAM Requirements
- **Minimum**: ~4 GB
- **Recommended**: ~6 GB (for longer audio)
## Usage with Mæstræa
These models are automatically downloaded by the Mæstræa AI Workstation backend.
### Direct Usage
```python
import audiosr
model = audiosr.build_model(model_name="basic")
waveform = audiosr.super_resolution(
model, "input.wav",
seed=42, guidance_scale=3.5, ddim_steps=50
)
```
## Original Source
| Variant | Original Repo |
|---------|--------------|
| basic | [haoheliu/audiosr_basic](https://huggingface.co/haoheliu/audiosr_basic) |
| speech | [haoheliu/audiosr_speech](https://huggingface.co/haoheliu/audiosr_speech) |
## License
MIT — same as the original AudioSR release.
## Credits
- **Model**: [AudioSR](https://github.com/haoheliu/versatile_audio_super_resolution) by Haohe Liu et al.
- **Paper**: [Versatile Audio Super Resolution](https://arxiv.org/abs/2309.07314)
- **Conversion & Mirror by**: [AEmotionStudio](https://huggingface.co/AEmotionStudio)