File size: 2,420 Bytes
8769e81
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f16d618
8769e81
0880f7c
 
 
 
 
 
8769e81
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f16d618
8769e81
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f16d618
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
license: mit
tags:
- audio
- super-resolution
- audio-upscaling
- comfyui
- audio-sr
- audiosr
- versatle-audio-super-resolution
library_name: diffusers
---

# AudioSR Models for ComfyUI

Pre-trained AudioSR (Versatile Audio Super Resolution) models for use with [ComfyUI-AudioSR](https://github.com/Saganaki22/ComfyUI-AudioSR) custom node.

<audio controls src="https://huggingface.co/drbaph/AudioSR/resolve/main/samples/speech_up_4.wav"></audio>
<audio controls src="https://huggingface.co/drbaph/AudioSR/resolve/main/samples/speech_audiosr_4.wav"></audio>

![ComfyUI_temp_bildo_00002_](https://cdn-uploads.huggingface.co/production/uploads/63473b59e5c0717e6737b872/ZMK6nkhj26kbLgRwJZqYp.png)


## Models

### audiosr_basic_fp32.safetensors
- **Purpose:** General audio super-resolution
- **Best for:** Music, sound effects, podcasts, mixed content
- **Format:** FP32 SafeTensors
- **Size:** ~5.9 GB

### audiosr_speech_fp32.safetensors
- **Purpose:** Speech/voice optimized super-resolution
- **Best for:** Voice recordings, vocals, speech content
- **Format:** FP32 SafeTensors
- **Size:** ~5.9 GB

## Usage

### Installation

1. Install [ComfyUI-AudioSR](https://github.com/Saganaki22/ComfyUI-AudioSR) via ComfyUI Manager
2. Download model(s) from this repository
3. Place in `ComfyUI/models/AudioSR/`

### Quick Start

```
ComfyUI Workflow:
Load Audio → AudioSR → Preview/Save Audio
```

**Recommended Settings:**
- Steps: 50-100
- Guidance Scale: 3.5-5.0
- Model: Use `audiosr_speech_fp32.safetensors` for voice, `audiosr_basic_fp32.safetensors` for everything else

## What it does

AudioSR upscales low-quality audio to high-quality 48kHz output using latent diffusion. It:

- Resamples to 48kHz
- Enhances high frequencies
- Reduces compression artifacts
- Adds clarity and detail

## Model Info

Based on [AudioSR: Versatile Audio Super-Resolution](https://arxiv.org/abs/2309.07314) by Haohe Liu et al.

Original repository: https://github.com/haoheliu/versatile_audio_super_resolution

**License:** MIT

## Hardware Requirements

- **GPU:** NVIDIA RTX 3060 or higher (6GB+ VRAM minimum)
- **RAM:** 12GB+ recommended
- Works best with audio > 8kHz input sample rate

## Credits

- **Research:** [Haohe Liu](https://github.com/haoheliu) et al.
- **Paper:** [AudioSR on arXiv](https://arxiv.org/abs/2309.07314)
- **ComfyUI Integration:** [ComfyUI-AudioSR](https://github.com/Saganaki22/ComfyUI-AudioSR)