hsoh
/

ComVo-base

Model card Files Files and versions

ComVo-base / README.md

hsoh's picture

Update README.md

e2e2f3e verified 2 days ago

|

history blame contribute delete

2.31 kB

	---
	tags:
	- audio
	- vocoder
	- pytorch
	- neural-audio
	- complex-valued
	library_name: pytorch
	---

	# ComVo: Complex-Valued Neural Vocoder

	## Model description

	ComVo is a complex-valued neural vocoder for waveform generation based on iSTFT.
	Unlike conventional real-valued vocoders that process real and imaginary parts separately, ComVo operates directly in the complex domain using native complex arithmetic.

	This enables:
	- Structured modeling of complex spectrograms
	- Adversarial training in the complex domain
	- Improved waveform synthesis quality

	The model also introduces:
	- Phase quantization for structured phase modeling
	- Block-matrix computation for improved training efficiency

	## Paper

	Toward Complex-Valued Neural Networks for Waveform Generation
	Hyung-Seok Oh, Deok-Hyeon Cho, Seung-Bin Kim, Seong-Whan Lee
	ICLR 2026

	https://openreview.net/forum?id=U4GXPqm3Va

	## Intended use

	This model is designed for:
	- Neural vocoding
	- Speech synthesis pipelines (e.g., TTS)
	- Audio waveform reconstruction from spectral features

	### Input
	- Raw waveform ([1, T]) or extracted features

	### Output
	- Generated waveform at 24kHz

	## Usage

	### Load model

	```python
	from hf_model import ComVoHF

	model = ComVoHF.from_pretrained("hsoh/ComVo-base")
	model.eval()
	```

	### Inference from waveform

	```python
	audio = model.from_waveform(wav)
	```

	### Inference from features
	```python
	features = model.build_feature_extractor()(wav)
	audio = model(features)
	```

	## Model details
	\| Model \| Parameters \| Sampling rate \|
	\| ----- \| ---------- \| ------------- \|
	\| Base \| 13.28M \| 24 kHz \|
	\| Large \| 114.56M \| 24 kHz \|

	## Evaluation
	\| Model \| UTMOS ↑ \| PESQ (wb) ↑ \| PESQ (nb) ↑ \| MRSTFT ↓ \|
	\| ----- \| ------- \| ----------- \| ----------- \| -------- \|
	\| Base \| 3.6744 \| 3.8219 \| 4.0727 \| 0.8580 \|
	\| Large \| 3.7618 \| 3.9993 \| 4.1639 \| 0.8227 \|

	## Resources
	Paper: https://openreview.net/forum?id=U4GXPqm3Va

	Demo: https://hs-oh-prml.github.io/ComVo/

	Code: https://github.com/hs-oh-prml/ComVo

	## Citation
	```bibtex
	@inproceedings{
	oh2026toward,
	title={Toward Complex-Valued Neural Networks for Waveform Generation},
	author={Hyung-Seok Oh and Deok-Hyeon Cho and Seung-Bin Kim and Seong-Whan Lee},
	booktitle={ICLR},
	year={2026}
	}
	```