metadata
license: cc-by-4.0
tags:
- audio
- audio-super-resolution
- speech
- flow-matching
library_name: pytorch
UniverSR - Speech Only
Vocoder-free speech super-resolution model that upsamples 8/12/16/24 kHz → 48 kHz using flow matching in the complex STFT domain. Trained on speech data for VCTK benchmark evaluation.
For general use across speech, music, and sound effects, see universr-audio (recommended).
Paper: arXiv:2510.00771 Demo: woongzip1.github.io/universr-demo | Code: github.com/woongzip1/UniverSR
Usage
import torchaudio
from universr import UniverSR
model = UniverSR.from_pretrained("woongzip1/universr-speech", device="cuda")
output = model.enhance("low_res_speech.wav", input_sr=8000)
torchaudio.save("output_48k.wav", output.cpu(), 48000)
Citation
@inproceedings{choi2026universr,
title = {{UniverSR}: Unified and Versatile Audio Super-Resolution via Vocoder-Free Flow Matching},
author = {Choi, Woongjib and Lee, Sangmin and Lim, Hyungseob and Kang, Hong-Goo},
booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
year = {2026}
}