Update README.md

68859a3 verified 2 days ago

2.31 kB

tags:
  - audio
  - vocoder
  - pytorch
  - neural-audio
  - complex-valued
library_name: pytorch

ComVo: Complex-Valued Neural Vocoder

Model description

ComVo is a complex-valued neural vocoder for waveform generation based on iSTFT.
Unlike conventional real-valued vocoders that process real and imaginary parts separately, ComVo operates directly in the complex domain using native complex arithmetic.

This enables:

Structured modeling of complex spectrograms
Adversarial training in the complex domain
Improved waveform synthesis quality

The model also introduces:

Phase quantization for structured phase modeling
Block-matrix computation for improved training efficiency

Paper

Toward Complex-Valued Neural Networks for Waveform Generation
Hyung-Seok Oh, Deok-Hyeon Cho, Seung-Bin Kim, Seong-Whan Lee
ICLR 2026

https://openreview.net/forum?id=U4GXPqm3Va

Intended use

This model is designed for:

Neural vocoding
Speech synthesis pipelines (e.g., TTS)
Audio waveform reconstruction from spectral features

Input

Raw waveform ([1, T]) or extracted features

Output

Generated waveform at 24kHz

Usage

Load model

from hf_model import ComVoHF

model = ComVoHF.from_pretrained("hsoh/ComVo-base")
model.eval()

Inference from waveform

audio = model.from_waveform(wav)

Inference from features

features = model.build_feature_extractor()(wav)
audio = model(features)

Model details

Model	Parameters	Sampling rate
Base	13.28M	24 kHz
Large	114.56M	24 kHz

Evaluation

Model	UTMOS ↑	PESQ (wb) ↑	PESQ (nb) ↑	MRSTFT ↓
Base	3.6744	3.8219	4.0727	0.8580
Large	3.7618	3.9993	4.1639	0.8227

Resources

Paper: https://openreview.net/forum?id=U4GXPqm3Va

Demo: https://hs-oh-prml.github.io/ComVo/

Code: https://github.com/hs-oh-prml/ComVo

Citation

@inproceedings{
  oh2026toward,
  title={Toward Complex-Valued Neural Networks for Waveform Generation},
  author={Hyung-Seok Oh and Deok-Hyeon Cho and Seung-Bin Kim and Seong-Whan Lee},
  booktitle={ICLR},
  year={2026}
}