ComVo: Complex-Valued Neural Vocoder for Waveform Generation

[ICLR 2026] Toward Complex-Valued Neural Networks for Waveform Generation
Hyung-Seok Oh, Deok-Hyeon Cho, Seung-Bin Kim, Seong-Whan Lee

Overview

ComVo is a neural vocoder for waveform generation based on iSTFT.
It models complex-valued spectrograms and synthesizes waveforms via inverse short-time Fourier transform.

Conventional iSTFT-based vocoders typically process real and imaginary components separately.
ComVo instead operates in the complex domain, allowing the model to capture structural relationships between magnitude and phase more effectively.

Method

ComVo is built on the following components:

Complex-domain modeling
The generator and discriminator operate on complex-valued representations.
Adversarial training in the complex domain
The discriminator provides feedback directly on complex spectrograms.
Phase quantization
Phase values are discretized to regularize learning and guide phase transformation.
Block-matrix computation
A structured computation scheme that reduces redundant operations.

Model Details

Architecture: GAN-based neural vocoder
Representation: Complex spectrogram
Sampling rate: 24 kHz
Framework: PyTorch ≥ 2.0

Usage

Installation

pip install -r requirements.txt

Inference

python infer.py \
  -c configs/configs.yaml \
  --ckpt /path/to/comvo.ckpt \
  --wavfile /path/to/input.wav \
  --out_dir ./results

Training

python train.py -c configs/configs.yaml

Configuration details are specified in configs/configs.yaml.

Pretrained Model

A pretrained checkpoint is provided for inference.

Checkpoint: https://works.do/xM2ttS4
Configuration: configs/configs.yaml
Sampling rate: 24 kHz

Please ensure that the configuration file matches the checkpoint when running inference.

Limitations

The model is trained for 24 kHz audio and may not generalize to other sampling rates
GPU is recommended for efficient inference and training
Complex-valued operations may not be fully supported in all deployment environments

Citation

@inproceedings{
  oh2026toward,
  title={Toward Complex-Valued Neural Networks for Waveform Generation},
  author={Hyung-Seok Oh and Deok-Hyeon Cho and Seung-Bin Kim and Seong-Whan Lee},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2026},
  url={https://openreview.net/forum?id=U4GXPqm3Va}
}

Acknowledgements

For additional details, please refer to the paper and the project page with audio samples.

Downloads last month: 5