ComVo: Complex-Valued Neural Vocoder for Waveform Generation

[ICLR 2026] Toward Complex-Valued Neural Networks for Waveform Generation
Hyung-Seok Oh, Deok-Hyeon Cho, Seung-Bin Kim, Seong-Whan Lee


Overview

ComVo is a neural vocoder for waveform generation based on iSTFT.
It models complex-valued spectrograms and synthesizes waveforms via inverse short-time Fourier transform.

Conventional iSTFT-based vocoders typically process real and imaginary components separately.
ComVo instead operates in the complex domain, allowing the model to capture structural relationships between magnitude and phase more effectively.


Method

ComVo is built on the following components:

  • Complex-domain modeling
    The generator and discriminator operate on complex-valued representations.

  • Adversarial training in the complex domain
    The discriminator provides feedback directly on complex spectrograms.

  • Phase quantization
    Phase values are discretized to regularize learning and guide phase transformation.

  • Block-matrix computation
    A structured computation scheme that reduces redundant operations.


Model Details

  • Architecture: GAN-based neural vocoder
  • Representation: Complex spectrogram
  • Sampling rate: 24 kHz
  • Framework: PyTorch โ‰ฅ 2.0

Usage

Installation

pip install -r requirements.txt

Inference

python infer.py \
  -c configs/configs.yaml \
  --ckpt /path/to/comvo.ckpt \
  --wavfile /path/to/input.wav \
  --out_dir ./results

Training

python train.py -c configs/configs.yaml

Configuration details are specified in configs/configs.yaml.

Pretrained Model

A pretrained checkpoint is provided for inference.

Please ensure that the configuration file matches the checkpoint when running inference.


Limitations

  • The model is trained for 24 kHz audio and may not generalize to other sampling rates
  • GPU is recommended for efficient inference and training
  • Complex-valued operations may not be fully supported in all deployment environments

Citation

@inproceedings{
  oh2026toward,
  title={Toward Complex-Valued Neural Networks for Waveform Generation},
  author={Hyung-Seok Oh and Deok-Hyeon Cho and Seung-Bin Kim and Seong-Whan Lee},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2026},
  url={https://openreview.net/forum?id=U4GXPqm3Va}
}

Acknowledgements

For additional details, please refer to the paper and the project page with audio samples.

Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support