| --- |
| tags: |
| - audio |
| - vocoder |
| - pytorch |
| - neural-audio |
| - complex-valued |
| library_name: pytorch |
| --- |
| |
| # ComVo: Complex-Valued Neural Vocoder |
|
|
| ## Model description |
|
|
| ComVo is a complex-valued neural vocoder for waveform generation based on iSTFT. |
| Unlike conventional real-valued vocoders that process real and imaginary parts separately, ComVo operates directly in the complex domain using native complex arithmetic. |
|
|
| This enables: |
| - Structured modeling of complex spectrograms |
| - Adversarial training in the complex domain |
| - Improved waveform synthesis quality |
|
|
| The model also introduces: |
| - Phase quantization for structured phase modeling |
| - Block-matrix computation for improved training efficiency |
|
|
| ## Paper |
|
|
| **Toward Complex-Valued Neural Networks for Waveform Generation** |
| Hyung-Seok Oh, Deok-Hyeon Cho, Seung-Bin Kim, Seong-Whan Lee |
| ICLR 2026 |
|
|
| https://openreview.net/forum?id=U4GXPqm3Va |
|
|
| ## Intended use |
|
|
| This model is designed for: |
| - Neural vocoding |
| - Speech synthesis pipelines (e.g., TTS) |
| - Audio waveform reconstruction from spectral features |
|
|
| ### Input |
| - Raw waveform ([1, T]) or extracted features |
|
|
| ### Output |
| - Generated waveform at 24kHz |
|
|
| ## Usage |
|
|
| ### Load model |
|
|
| ```python |
| from hf_model import ComVoHF |
| |
| model = ComVoHF.from_pretrained("hsoh/ComVo-base") |
| model.eval() |
| ``` |
|
|
| ### Inference from waveform |
|
|
| ```python |
| audio = model.from_waveform(wav) |
| ``` |
|
|
| ### Inference from features |
| ```python |
| features = model.build_feature_extractor()(wav) |
| audio = model(features) |
| ``` |
|
|
| ## Model details |
| | Model | Parameters | Sampling rate | |
| | ----- | ---------- | ------------- | |
| | Base | 13.28M | 24 kHz | |
| | Large | 114.56M | 24 kHz | |
|
|
| ## Evaluation |
| | Model | UTMOS ↑ | PESQ (wb) ↑ | PESQ (nb) ↑ | MRSTFT ↓ | |
| | ----- | ------- | ----------- | ----------- | -------- | |
| | Base | 3.6744 | 3.8219 | 4.0727 | 0.8580 | |
| | Large | 3.7618 | 3.9993 | 4.1639 | 0.8227 | |
|
|
| ## Resources |
| Paper: https://openreview.net/forum?id=U4GXPqm3Va |
|
|
| Demo: https://hs-oh-prml.github.io/ComVo/ |
|
|
| Code: https://github.com/hs-oh-prml/ComVo |
|
|
| ## Citation |
| ```bibtex |
| @inproceedings{ |
| oh2026toward, |
| title={Toward Complex-Valued Neural Networks for Waveform Generation}, |
| author={Hyung-Seok Oh and Deok-Hyeon Cho and Seung-Bin Kim and Seong-Whan Lee}, |
| booktitle={ICLR}, |
| year={2026} |
| } |
| ``` |