|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- audio |
|
|
- super-resolution |
|
|
- audio-upscaling |
|
|
- comfyui |
|
|
- audio-sr |
|
|
- audiosr |
|
|
- versatle-audio-super-resolution |
|
|
library_name: diffusers |
|
|
--- |
|
|
|
|
|
# AudioSR Models for ComfyUI |
|
|
|
|
|
Pre-trained AudioSR (Versatile Audio Super Resolution) models for use with [ComfyUI-AudioSR](https://github.com/Saganaki22/ComfyUI-AudioSR) custom node. |
|
|
|
|
|
<audio controls src="https://huggingface.co/drbaph/AudioSR/resolve/main/samples/speech_up_4.wav"></audio> |
|
|
<audio controls src="https://huggingface.co/drbaph/AudioSR/resolve/main/samples/speech_audiosr_4.wav"></audio> |
|
|
|
|
|
 |
|
|
|
|
|
|
|
|
## Models |
|
|
|
|
|
### audiosr_basic_fp32.safetensors |
|
|
- **Purpose:** General audio super-resolution |
|
|
- **Best for:** Music, sound effects, podcasts, mixed content |
|
|
- **Format:** FP32 SafeTensors |
|
|
- **Size:** ~5.9 GB |
|
|
|
|
|
### audiosr_speech_fp32.safetensors |
|
|
- **Purpose:** Speech/voice optimized super-resolution |
|
|
- **Best for:** Voice recordings, vocals, speech content |
|
|
- **Format:** FP32 SafeTensors |
|
|
- **Size:** ~5.9 GB |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Installation |
|
|
|
|
|
1. Install [ComfyUI-AudioSR](https://github.com/Saganaki22/ComfyUI-AudioSR) via ComfyUI Manager |
|
|
2. Download model(s) from this repository |
|
|
3. Place in `ComfyUI/models/AudioSR/` |
|
|
|
|
|
### Quick Start |
|
|
|
|
|
``` |
|
|
ComfyUI Workflow: |
|
|
Load Audio → AudioSR → Preview/Save Audio |
|
|
``` |
|
|
|
|
|
**Recommended Settings:** |
|
|
- Steps: 50-100 |
|
|
- Guidance Scale: 3.5-5.0 |
|
|
- Model: Use `audiosr_speech_fp32.safetensors` for voice, `audiosr_basic_fp32.safetensors` for everything else |
|
|
|
|
|
## What it does |
|
|
|
|
|
AudioSR upscales low-quality audio to high-quality 48kHz output using latent diffusion. It: |
|
|
|
|
|
- Resamples to 48kHz |
|
|
- Enhances high frequencies |
|
|
- Reduces compression artifacts |
|
|
- Adds clarity and detail |
|
|
|
|
|
## Model Info |
|
|
|
|
|
Based on [AudioSR: Versatile Audio Super-Resolution](https://arxiv.org/abs/2309.07314) by Haohe Liu et al. |
|
|
|
|
|
Original repository: https://github.com/haoheliu/versatile_audio_super_resolution |
|
|
|
|
|
**License:** MIT |
|
|
|
|
|
## Hardware Requirements |
|
|
|
|
|
- **GPU:** NVIDIA RTX 3060 or higher (6GB+ VRAM minimum) |
|
|
- **RAM:** 12GB+ recommended |
|
|
- Works best with audio > 8kHz input sample rate |
|
|
|
|
|
## Credits |
|
|
|
|
|
- **Research:** [Haohe Liu](https://github.com/haoheliu) et al. |
|
|
- **Paper:** [AudioSR on arXiv](https://arxiv.org/abs/2309.07314) |
|
|
- **ComfyUI Integration:** [ComfyUI-AudioSR](https://github.com/Saganaki22/ComfyUI-AudioSR) |
|
|
|