hsoh
/

ComVo-base

@@ -1,10 +1,99 @@
 ---
 tags:
-- model_hub_mixin
-- pytorch_model_hub_mixin
 ---
-This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
-- Code: [More Information Needed]
-- Paper: [More Information Needed]
-- Docs: [More Information Needed]

 ---
 tags:
+- audio
+- vocoder
+- pytorch
+- neural-audio
+- complex-valued
+library_name: pytorch
 ---
+# ComVo: Complex-Valued Neural Vocoder
+## Model description
+ComVo is a complex-valued neural vocoder for waveform generation based on iSTFT.
+Unlike conventional real-valued vocoders that process real and imaginary parts separately, ComVo operates directly in the complex domain using native complex arithmetic.
+This enables:
+- Structured modeling of complex spectrograms
+- Adversarial training in the complex domain
+- Improved waveform synthesis quality
+The model also introduces:
+- Phase quantization for structured phase modeling
+- Block-matrix computation for improved training efficiency
+## Paper
+**Toward Complex-Valued Neural Networks for Waveform Generation**
+Hyung-Seok Oh, Deok-Hyeon Cho, Seung-Bin Kim, Seong-Whan Lee
+ICLR 2026
+https://openreview.net/forum?id=U4GXPqm3Va
+## Intended use
+This model is designed for:
+- Neural vocoding
+- Speech synthesis pipelines (e.g., TTS)
+- Audio waveform reconstruction from spectral features
+### Input
+- Raw waveform ([1, T]) or extracted features
+### Output
+- Generated waveform at 24kHz
+## Usage
+### Load model
+```python
+from hf_model import ComVoHF
+model = ComVoHF.from_pretrained("hsoh/ComVo-base")
+model.eval()
+```
+### Inference from waveform
+```python
+audio = model.from_waveform(wav)
+```
+### Inference from features
+```python
+features = model.build_feature_extractor()(wav)
+audio = model(features)
+```
+## Model details
+| Model | Parameters | Sampling rate |
+| ----- | ---------- | ------------- |
+| Base  | 13.28M     | 24 kHz        |
+| Large | 114.56M    | 24 kHz        |
+## Evaluation
+| Model | UTMOS ↑ | PESQ (wb) ↑ | PESQ (nb) ↑ | MRSTFT ↓ |
+| ----- | ------- | ----------- | ----------- | -------- |
+| Base  | 3.6744  | 3.8219      | 4.0727      | 0.8580   |
+| Large | 3.7618  | 3.9993      | 4.1639      | 0.8227   |
+## Resources
+Paper: https://openreview.net/forum?id=U4GXPqm3Va
+Demo: https://hs-oh-prml.github.io/ComVo/
+Code: https://github.com/hs-oh-prml/ComVo
+## Citation
+```bibtex
+@inproceedings{
+  oh2026toward,
+  title={Toward Complex-Valued Neural Networks for Waveform Generation},
+  author={Hyung-Seok Oh and Deok-Hyeon Cho and Seung-Bin Kim and Seong-Whan Lee},
+  booktitle={ICLR},
+  year={2026}
+}
+```