ComVo-large / README.md
hsoh's picture
Update README.md
68859a3 verified
metadata
tags:
  - audio
  - vocoder
  - pytorch
  - neural-audio
  - complex-valued
library_name: pytorch

ComVo: Complex-Valued Neural Vocoder

Model description

ComVo is a complex-valued neural vocoder for waveform generation based on iSTFT.
Unlike conventional real-valued vocoders that process real and imaginary parts separately, ComVo operates directly in the complex domain using native complex arithmetic.

This enables:

  • Structured modeling of complex spectrograms
  • Adversarial training in the complex domain
  • Improved waveform synthesis quality

The model also introduces:

  • Phase quantization for structured phase modeling
  • Block-matrix computation for improved training efficiency

Paper

Toward Complex-Valued Neural Networks for Waveform Generation
Hyung-Seok Oh, Deok-Hyeon Cho, Seung-Bin Kim, Seong-Whan Lee
ICLR 2026

https://openreview.net/forum?id=U4GXPqm3Va

Intended use

This model is designed for:

  • Neural vocoding
  • Speech synthesis pipelines (e.g., TTS)
  • Audio waveform reconstruction from spectral features

Input

  • Raw waveform ([1, T]) or extracted features

Output

  • Generated waveform at 24kHz

Usage

Load model

from hf_model import ComVoHF

model = ComVoHF.from_pretrained("hsoh/ComVo-base")
model.eval()

Inference from waveform

audio = model.from_waveform(wav)

Inference from features

features = model.build_feature_extractor()(wav)
audio = model(features)

Model details

Model Parameters Sampling rate
Base 13.28M 24 kHz
Large 114.56M 24 kHz

Evaluation

Model UTMOS ↑ PESQ (wb) ↑ PESQ (nb) ↑ MRSTFT ↓
Base 3.6744 3.8219 4.0727 0.8580
Large 3.7618 3.9993 4.1639 0.8227

Resources

Paper: https://openreview.net/forum?id=U4GXPqm3Va

Demo: https://hs-oh-prml.github.io/ComVo/

Code: https://github.com/hs-oh-prml/ComVo

Citation

@inproceedings{
  oh2026toward,
  title={Toward Complex-Valued Neural Networks for Waveform Generation},
  author={Hyung-Seok Oh and Deok-Hyeon Cho and Seung-Bin Kim and Seong-Whan Lee},
  booktitle={ICLR},
  year={2026}
}