metadata
tags:
- audio
- vocoder
- pytorch
- neural-audio
- complex-valued
library_name: pytorch
ComVo: Complex-Valued Neural Vocoder
Model description
ComVo is a complex-valued neural vocoder for waveform generation based on iSTFT.
Unlike conventional real-valued vocoders that process real and imaginary parts separately, ComVo operates directly in the complex domain using native complex arithmetic.
This enables:
- Structured modeling of complex spectrograms
- Adversarial training in the complex domain
- Improved waveform synthesis quality
The model also introduces:
- Phase quantization for structured phase modeling
- Block-matrix computation for improved training efficiency
Paper
Toward Complex-Valued Neural Networks for Waveform Generation
Hyung-Seok Oh, Deok-Hyeon Cho, Seung-Bin Kim, Seong-Whan Lee
ICLR 2026
https://openreview.net/forum?id=U4GXPqm3Va
Intended use
This model is designed for:
- Neural vocoding
- Speech synthesis pipelines (e.g., TTS)
- Audio waveform reconstruction from spectral features
Input
- Raw waveform ([1, T]) or extracted features
Output
- Generated waveform at 24kHz
Usage
Load model
from hf_model import ComVoHF
model = ComVoHF.from_pretrained("hsoh/ComVo-base")
model.eval()
Inference from waveform
audio = model.from_waveform(wav)
Inference from features
features = model.build_feature_extractor()(wav)
audio = model(features)
Model details
| Model | Parameters | Sampling rate |
|---|---|---|
| Base | 13.28M | 24 kHz |
| Large | 114.56M | 24 kHz |
Evaluation
| Model | UTMOS ↑ | PESQ (wb) ↑ | PESQ (nb) ↑ | MRSTFT ↓ |
|---|---|---|---|---|
| Base | 3.6744 | 3.8219 | 4.0727 | 0.8580 |
| Large | 3.7618 | 3.9993 | 4.1639 | 0.8227 |
Resources
Paper: https://openreview.net/forum?id=U4GXPqm3Va
Demo: https://hs-oh-prml.github.io/ComVo/
Code: https://github.com/hs-oh-prml/ComVo
Citation
@inproceedings{
oh2026toward,
title={Toward Complex-Valued Neural Networks for Waveform Generation},
author={Hyung-Seok Oh and Deok-Hyeon Cho and Seung-Bin Kim and Seong-Whan Lee},
booktitle={ICLR},
year={2026}
}