File size: 2,427 Bytes
ca3ea8d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
license: mit
tags:
  - audio
  - audio-super-resolution
  - upscaling
  - audiosr
  - safetensors
  - maestraea
pipeline_tag: audio-to-audio
---

# AudioSR Models (Safetensors)

**Audio Super-Resolution — Upscale Any Audio to 48kHz**

[Original Source](https://github.com/haoheliu/versatile_audio_super_resolution) by [Haohe Liu](https://github.com/haoheliu) · MIT License

> Converted from `pytorch_model.bin` to safetensors format for faster loading and safer deserialization. For use with [Mæstræa AI Workstation](https://github.com/AEmotionStudio/Maestraea).

## Available Models

| Variant | Files | Size | Description |
|---------|-------|------|-------------|
| **basic** | `basic/audiosr_basic.safetensors` | 6.2 GB | General audio (music, SFX, speech) |
| **speech** | `speech/audiosr_speech-*.safetensors` (3 shards) | 6.2 GB | Optimized for spoken word |

## What AudioSR Does

AudioSR uses latent diffusion to upscale any audio to 48kHz, restoring high-frequency content that was lost to:

- Low sample rate recording (8kHz, 16kHz, 22kHz → 48kHz)
- Lossy compression (MP3, AAC artifacts)
- Bandwidth-limited audio

### Key Parameters

| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| `ddim_steps` | 10–200 | 50 | More steps = higher quality |
| `guidance_scale` | 1–10 | 3.5 | Prompt adherence |
| `model_name` | basic/speech | basic | Which variant to use |

### VRAM Requirements

- **Minimum**: ~4 GB
- **Recommended**: ~6 GB (for longer audio)

## Usage with Mæstræa

These models are automatically downloaded by the Mæstræa AI Workstation backend.

### Direct Usage

```python
import audiosr
model = audiosr.build_model(model_name="basic")
waveform = audiosr.super_resolution(
    model, "input.wav",
    seed=42, guidance_scale=3.5, ddim_steps=50
)
```

## Original Source

| Variant | Original Repo |
|---------|--------------|
| basic | [haoheliu/audiosr_basic](https://huggingface.co/haoheliu/audiosr_basic) |
| speech | [haoheliu/audiosr_speech](https://huggingface.co/haoheliu/audiosr_speech) |

## License

MIT — same as the original AudioSR release.

## Credits

- **Model**: [AudioSR](https://github.com/haoheliu/versatile_audio_super_resolution) by Haohe Liu et al.
- **Paper**: [Versatile Audio Super Resolution](https://arxiv.org/abs/2309.07314)
- **Conversion & Mirror by**: [AEmotionStudio](https://huggingface.co/AEmotionStudio)