hsoh commited on
Commit
e2e2f3e
·
verified ·
1 Parent(s): 5f4884c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -6
README.md CHANGED
@@ -1,10 +1,99 @@
1
  ---
2
  tags:
3
- - model_hub_mixin
4
- - pytorch_model_hub_mixin
 
 
 
 
5
  ---
6
 
7
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
- - Code: [More Information Needed]
9
- - Paper: [More Information Needed]
10
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  tags:
3
+ - audio
4
+ - vocoder
5
+ - pytorch
6
+ - neural-audio
7
+ - complex-valued
8
+ library_name: pytorch
9
  ---
10
 
11
+ # ComVo: Complex-Valued Neural Vocoder
12
+
13
+ ## Model description
14
+
15
+ ComVo is a complex-valued neural vocoder for waveform generation based on iSTFT.
16
+ Unlike conventional real-valued vocoders that process real and imaginary parts separately, ComVo operates directly in the complex domain using native complex arithmetic.
17
+
18
+ This enables:
19
+ - Structured modeling of complex spectrograms
20
+ - Adversarial training in the complex domain
21
+ - Improved waveform synthesis quality
22
+
23
+ The model also introduces:
24
+ - Phase quantization for structured phase modeling
25
+ - Block-matrix computation for improved training efficiency
26
+
27
+ ## Paper
28
+
29
+ **Toward Complex-Valued Neural Networks for Waveform Generation**
30
+ Hyung-Seok Oh, Deok-Hyeon Cho, Seung-Bin Kim, Seong-Whan Lee
31
+ ICLR 2026
32
+
33
+ https://openreview.net/forum?id=U4GXPqm3Va
34
+
35
+ ## Intended use
36
+
37
+ This model is designed for:
38
+ - Neural vocoding
39
+ - Speech synthesis pipelines (e.g., TTS)
40
+ - Audio waveform reconstruction from spectral features
41
+
42
+ ### Input
43
+ - Raw waveform ([1, T]) or extracted features
44
+
45
+ ### Output
46
+ - Generated waveform at 24kHz
47
+
48
+ ## Usage
49
+
50
+ ### Load model
51
+
52
+ ```python
53
+ from hf_model import ComVoHF
54
+
55
+ model = ComVoHF.from_pretrained("hsoh/ComVo-base")
56
+ model.eval()
57
+ ```
58
+
59
+ ### Inference from waveform
60
+
61
+ ```python
62
+ audio = model.from_waveform(wav)
63
+ ```
64
+
65
+ ### Inference from features
66
+ ```python
67
+ features = model.build_feature_extractor()(wav)
68
+ audio = model(features)
69
+ ```
70
+
71
+ ## Model details
72
+ | Model | Parameters | Sampling rate |
73
+ | ----- | ---------- | ------------- |
74
+ | Base | 13.28M | 24 kHz |
75
+ | Large | 114.56M | 24 kHz |
76
+
77
+ ## Evaluation
78
+ | Model | UTMOS ↑ | PESQ (wb) ↑ | PESQ (nb) ↑ | MRSTFT ↓ |
79
+ | ----- | ------- | ----------- | ----------- | -------- |
80
+ | Base | 3.6744 | 3.8219 | 4.0727 | 0.8580 |
81
+ | Large | 3.7618 | 3.9993 | 4.1639 | 0.8227 |
82
+
83
+ ## Resources
84
+ Paper: https://openreview.net/forum?id=U4GXPqm3Va
85
+
86
+ Demo: https://hs-oh-prml.github.io/ComVo/
87
+
88
+ Code: https://github.com/hs-oh-prml/ComVo
89
+
90
+ ## Citation
91
+ ```bibtex
92
+ @inproceedings{
93
+ oh2026toward,
94
+ title={Toward Complex-Valued Neural Networks for Waveform Generation},
95
+ author={Hyung-Seok Oh and Deok-Hyeon Cho and Seung-Bin Kim and Seong-Whan Lee},
96
+ booktitle={ICLR},
97
+ year={2026}
98
+ }
99
+ ```