ComVo: Complex-Valued Neural Vocoder for Waveform Generation
[ICLR 2026] Toward Complex-Valued Neural Networks for Waveform Generation
Hyung-Seok Oh, Deok-Hyeon Cho, Seung-Bin Kim, Seong-Whan Lee
- ๐ OpenReview Paper
- ๐ Audio Samples
- ๐ป Code Repository
Overview
ComVo is a neural vocoder for waveform generation based on iSTFT.
It models complex-valued spectrograms and synthesizes waveforms via inverse short-time Fourier transform.
Conventional iSTFT-based vocoders typically process real and imaginary components separately.
ComVo instead operates in the complex domain, allowing the model to capture structural relationships between magnitude and phase more effectively.
Method
ComVo is built on the following components:
Complex-domain modeling
The generator and discriminator operate on complex-valued representations.Adversarial training in the complex domain
The discriminator provides feedback directly on complex spectrograms.Phase quantization
Phase values are discretized to regularize learning and guide phase transformation.Block-matrix computation
A structured computation scheme that reduces redundant operations.
Model Details
- Architecture: GAN-based neural vocoder
- Representation: Complex spectrogram
- Sampling rate: 24 kHz
- Framework: PyTorch โฅ 2.0
Usage
Installation
pip install -r requirements.txt
Inference
python infer.py \
-c configs/configs.yaml \
--ckpt /path/to/comvo.ckpt \
--wavfile /path/to/input.wav \
--out_dir ./results
Training
python train.py -c configs/configs.yaml
Configuration details are specified in configs/configs.yaml.
Pretrained Model
A pretrained checkpoint is provided for inference.
- Checkpoint: https://works.do/xM2ttS4
- Configuration:
configs/configs.yaml - Sampling rate: 24 kHz
Please ensure that the configuration file matches the checkpoint when running inference.
Limitations
- The model is trained for 24 kHz audio and may not generalize to other sampling rates
- GPU is recommended for efficient inference and training
- Complex-valued operations may not be fully supported in all deployment environments
Citation
@inproceedings{
oh2026toward,
title={Toward Complex-Valued Neural Networks for Waveform Generation},
author={Hyung-Seok Oh and Deok-Hyeon Cho and Seung-Bin Kim and Seong-Whan Lee},
booktitle={International Conference on Learning Representations (ICLR)},
year={2026},
url={https://openreview.net/forum?id=U4GXPqm3Va}
}
Acknowledgements
For additional details, please refer to the paper and the project page with audio samples.
- Downloads last month
- 5