hsoh commited on
Commit
fff6c6b
·
verified ·
1 Parent(s): 9bbd232

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +124 -3
README.md CHANGED
@@ -1,3 +1,124 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - audio
5
+ - vocoder
6
+ - speech
7
+ - cvnn
8
+ - istft
9
+ - pytorch
10
+ pipeline_tag: audio-to-audio
11
+ ---
12
+
13
+ # ComVo: Complex-Valued Neural Vocoder for Waveform Generation
14
+
15
+ **[ICLR 2026] Toward Complex-Valued Neural Networks for Waveform Generation**
16
+ Hyung-Seok Oh, Deok-Hyeon Cho, Seung-Bin Kim, Seong-Whan Lee
17
+
18
+ - 📄 [OpenReview Paper](https://openreview.net/forum?id=U4GXPqm3Va)
19
+ - 🔊 [Audio Samples](https://hs-oh-prml.github.io/ComVo/)
20
+ - 💻 [Code Repository](https://github.com/hs-oh-prml/ComVo)
21
+
22
+ ---
23
+
24
+ ## Overview
25
+
26
+ ComVo is a neural vocoder for waveform generation based on iSTFT.
27
+ It models complex-valued spectrograms and synthesizes waveforms via inverse short-time Fourier transform.
28
+
29
+ Conventional iSTFT-based vocoders typically process real and imaginary components separately.
30
+ ComVo instead operates in the complex domain, allowing the model to capture structural relationships between magnitude and phase more effectively.
31
+
32
+ ---
33
+
34
+
35
+ ## Method
36
+
37
+ ComVo is built on the following components:
38
+
39
+ - **Complex-domain modeling**
40
+ The generator and discriminator operate on complex-valued representations.
41
+
42
+ - **Adversarial training in the complex domain**
43
+ The discriminator provides feedback directly on complex spectrograms.
44
+
45
+ - **Phase quantization**
46
+ Phase values are discretized to regularize learning and guide phase transformation.
47
+
48
+ - **Block-matrix computation**
49
+ A structured computation scheme that reduces redundant operations.
50
+
51
+ ---
52
+
53
+
54
+ ## Model Details
55
+
56
+ - **Architecture**: GAN-based neural vocoder
57
+ - **Representation**: Complex spectrogram
58
+ - **Sampling rate**: 24 kHz
59
+ - **Framework**: PyTorch ≥ 2.0
60
+
61
+ ---
62
+
63
+
64
+ ## Usage
65
+
66
+ ### Installation
67
+
68
+ ```bash
69
+ pip install -r requirements.txt
70
+ ```
71
+
72
+ ## Inference
73
+
74
+ ```bash
75
+ python infer.py \
76
+ -c configs/configs.yaml \
77
+ --ckpt /path/to/comvo.ckpt \
78
+ --wavfile /path/to/input.wav \
79
+ --out_dir ./results
80
+ ```
81
+
82
+ ## Training
83
+
84
+ ```bash
85
+ python train.py -c configs/configs.yaml
86
+ ```
87
+ Configuration details are specified in `configs/configs.yaml`.
88
+
89
+ ## Pretrained Model
90
+
91
+ A pretrained checkpoint is provided for inference.
92
+
93
+ - Checkpoint: https://works.do/xM2ttS4
94
+ - Configuration: `configs/configs.yaml`
95
+ - Sampling rate: 24 kHz
96
+
97
+ Please ensure that the configuration file matches the checkpoint when running inference.
98
+
99
+ ---
100
+
101
+ ## Limitations
102
+
103
+ - The model is trained for 24 kHz audio and may not generalize to other sampling rates
104
+ - GPU is recommended for efficient inference and training
105
+ - Complex-valued operations may not be fully supported in all deployment environments
106
+
107
+ ---
108
+
109
+ ## Citation
110
+
111
+ ```bibtex
112
+ @inproceedings{
113
+ oh2026toward,
114
+ title={Toward Complex-Valued Neural Networks for Waveform Generation},
115
+ author={Hyung-Seok Oh and Deok-Hyeon Cho and Seung-Bin Kim and Seong-Whan Lee},
116
+ booktitle={International Conference on Learning Representations (ICLR)},
117
+ year={2026},
118
+ url={https://openreview.net/forum?id=U4GXPqm3Va}
119
+ }
120
+ ```
121
+
122
+ ## Acknowledgements
123
+
124
+ For additional details, please refer to the paper and the project page with audio samples.