--- tags: - audio - vocoder - pytorch - neural-audio - complex-valued library_name: pytorch --- # ComVo: Complex-Valued Neural Vocoder ## Model description ComVo is a complex-valued neural vocoder for waveform generation based on iSTFT. Unlike conventional real-valued vocoders that process real and imaginary parts separately, ComVo operates directly in the complex domain using native complex arithmetic. This enables: - Structured modeling of complex spectrograms - Adversarial training in the complex domain - Improved waveform synthesis quality The model also introduces: - Phase quantization for structured phase modeling - Block-matrix computation for improved training efficiency ## Paper **Toward Complex-Valued Neural Networks for Waveform Generation** Hyung-Seok Oh, Deok-Hyeon Cho, Seung-Bin Kim, Seong-Whan Lee ICLR 2026 https://openreview.net/forum?id=U4GXPqm3Va ## Intended use This model is designed for: - Neural vocoding - Speech synthesis pipelines (e.g., TTS) - Audio waveform reconstruction from spectral features ### Input - Raw waveform ([1, T]) or extracted features ### Output - Generated waveform at 24kHz ## Usage ### Load model ```python from hf_model import ComVoHF model = ComVoHF.from_pretrained("hsoh/ComVo-base") model.eval() ``` ### Inference from waveform ```python audio = model.from_waveform(wav) ``` ### Inference from features ```python features = model.build_feature_extractor()(wav) audio = model(features) ``` ## Model details | Model | Parameters | Sampling rate | | ----- | ---------- | ------------- | | Base | 13.28M | 24 kHz | | Large | 114.56M | 24 kHz | ## Evaluation | Model | UTMOS ↑ | PESQ (wb) ↑ | PESQ (nb) ↑ | MRSTFT ↓ | | ----- | ------- | ----------- | ----------- | -------- | | Base | 3.6744 | 3.8219 | 4.0727 | 0.8580 | | Large | 3.7618 | 3.9993 | 4.1639 | 0.8227 | ## Resources Paper: https://openreview.net/forum?id=U4GXPqm3Va Demo: https://hs-oh-prml.github.io/ComVo/ Code: https://github.com/hs-oh-prml/ComVo ## Citation ```bibtex @inproceedings{ oh2026toward, title={Toward Complex-Valued Neural Networks for Waveform Generation}, author={Hyung-Seok Oh and Deok-Hyeon Cho and Seung-Bin Kim and Seong-Whan Lee}, booktitle={ICLR}, year={2026} } ```