TISDiSS: Training-Time and Inference-Time Scalable Framework for Discriminative Source Separation
Official implementation of TISDiSS, a scalable framework for discriminative source separation that enables flexible model scaling at both training and inference time.
π Highlights
- State-of-the-art Performance: Achieves SOTA results on WSJ0-2mix, WHAMR!, and Libri2Mix datasets
- Dynamic Inference: Adjustable Reconstruction block repeat times (N_re) at inference stage for performance-efficiency trade-offs without retraining
- Effective Training Strategy for Low-Latency Separation: Training with more inference repetitions consistently improves shallow-inference performance, offering a practical solution for low-latency separation
π Paper
arXiv: https://arxiv.org/abs/2509.15666
Huggingface Paper: https://huggingface.co/papers/2509.15666
Status: Submitted to ICASSP 2026
π Quick Start
git clone https://github.com/WingSingFung/TISDiSS.git
cd TISDiSS
Environment Setup
Install the required dependencies:
pip install -r requirements.txt
Inference
Navigate to the example directory and run inference on your audio files:
cd egs2/wsj0_2mix/enh1
python separate.py \
--model_path ./exp/enh_train_enh_tisdiss_tflocoformer_en-residual_en1x2_re1x6_l1+1x6_raw/valid.loss.ave_5best.pth \
--audio_path /path/to/input_audio \
--audio_output_dir /path/to/output_directory
Parameters:
--model_path: Path to the pre-trained model checkpoint--audio_path: Path to input audio file or directory--audio_output_dir: Directory where separated audio will be saved
π§ Training
1. Data Preparation
Navigate to the example directory:
cd egs2/wsj0_2mix/enh1
Note: You need to download the WSJ0 dataset separately (commercial license required).
Option A: WSJ0 in WAV Format
If your WSJ0 dataset is already in WAV format, create a symbolic link:
mkdir -p ./data/wsj0
ln -s /path/to/your/WSJ0 ./data/wsj0/wsj0
Alternatively, modify line 24 in ./local/data.sh to point to your WSJ0 path:
wsj_full_wav=/path/to/your/WSJ0/
Option B: WSJ0 in Original Format
If your dataset is in the original WSJ0 format:
- Uncomment lines 76-81 in
./egs2/wsj0_2mix/enh1/local/data.sh - Fill in the
WSJ0=path indb.sh
2. Preprocessing
Run data preparation and statistics collection:
./run.sh --stage 1 --stop_stage 5
3. Model Training
Train the TISDiSS model with the following command:
CUDA_VISIBLE_DEVICES=1 ./run.sh \
--stage 6 \
--stop_stage 6 \
--enh_config conf/efficient_train/tisdiss/train_enh_tisdiss_tflocoformer_en-residual_en1x2_re1x6_l1+1x6.yaml \
--ngpu 1
Training Configuration:
- The model uses TF-Locoformer as the backbone
- Training configuration: 2 Encoder blocks + 6 Reconstruction blocks
- Adjust
--ngputo use multiple GPUs if available
4. Inference with Different Scalability Settings
Run inference with various Reconstruction block configurations (N_re):
./infer_run.sh
You can modify the script to test different N_re values:
for re in 3 6 8; do
# Your inference commands here
done
π Citation
If you find this work useful in your research, please consider citing:
@article{feng2025tisdiss,
title={TISDiSS: A Training-Time and Inference-Time Scalable Framework for Discriminative Source Separation},
author={Feng, Yongsheng and Xu, Yuetonghui and Luo, Jiehui and Liu, Hongjia and Li, Xiaobing and Yu, Feng and Li, Wei},
journal={arXiv preprint arXiv:2509.15666},
year={2025}
}
- Downloads last month
- 4