File size: 2,312 Bytes
5f4884c e2e2f3e 5f4884c e2e2f3e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | ---
tags:
- audio
- vocoder
- pytorch
- neural-audio
- complex-valued
library_name: pytorch
---
# ComVo: Complex-Valued Neural Vocoder
## Model description
ComVo is a complex-valued neural vocoder for waveform generation based on iSTFT.
Unlike conventional real-valued vocoders that process real and imaginary parts separately, ComVo operates directly in the complex domain using native complex arithmetic.
This enables:
- Structured modeling of complex spectrograms
- Adversarial training in the complex domain
- Improved waveform synthesis quality
The model also introduces:
- Phase quantization for structured phase modeling
- Block-matrix computation for improved training efficiency
## Paper
**Toward Complex-Valued Neural Networks for Waveform Generation**
Hyung-Seok Oh, Deok-Hyeon Cho, Seung-Bin Kim, Seong-Whan Lee
ICLR 2026
https://openreview.net/forum?id=U4GXPqm3Va
## Intended use
This model is designed for:
- Neural vocoding
- Speech synthesis pipelines (e.g., TTS)
- Audio waveform reconstruction from spectral features
### Input
- Raw waveform ([1, T]) or extracted features
### Output
- Generated waveform at 24kHz
## Usage
### Load model
```python
from hf_model import ComVoHF
model = ComVoHF.from_pretrained("hsoh/ComVo-base")
model.eval()
```
### Inference from waveform
```python
audio = model.from_waveform(wav)
```
### Inference from features
```python
features = model.build_feature_extractor()(wav)
audio = model(features)
```
## Model details
| Model | Parameters | Sampling rate |
| ----- | ---------- | ------------- |
| Base | 13.28M | 24 kHz |
| Large | 114.56M | 24 kHz |
## Evaluation
| Model | UTMOS ↑ | PESQ (wb) ↑ | PESQ (nb) ↑ | MRSTFT ↓ |
| ----- | ------- | ----------- | ----------- | -------- |
| Base | 3.6744 | 3.8219 | 4.0727 | 0.8580 |
| Large | 3.7618 | 3.9993 | 4.1639 | 0.8227 |
## Resources
Paper: https://openreview.net/forum?id=U4GXPqm3Va
Demo: https://hs-oh-prml.github.io/ComVo/
Code: https://github.com/hs-oh-prml/ComVo
## Citation
```bibtex
@inproceedings{
oh2026toward,
title={Toward Complex-Valued Neural Networks for Waveform Generation},
author={Hyung-Seok Oh and Deok-Hyeon Cho and Seung-Bin Kim and Seong-Whan Lee},
booktitle={ICLR},
year={2026}
}
``` |