TISDiSS: Training-Time and Inference-Time Scalable Framework for Discriminative Source Separation

Official implementation of TISDiSS, a scalable framework for discriminative source separation that enables flexible model scaling at both training and inference time.

🏆 Highlights

State-of-the-art Performance: Achieves SOTA results on WSJ0-2mix, WHAMR!, and Libri2Mix datasets
Dynamic Inference: Adjustable Reconstruction block repeat times (N_re) at inference stage for performance-efficiency trade-offs without retraining
Effective Training Strategy for Low-Latency Separation: Training with more inference repetitions consistently improves shallow-inference performance, offering a practical solution for low-latency separation

📄 Paper

arXiv: https://arxiv.org/abs/2509.15666

Huggingface Paper: https://huggingface.co/papers/2509.15666

Status: Submitted to ICASSP 2026

🚀 Quick Start

git clone https://github.com/WingSingFung/TISDiSS.git
cd TISDiSS

Environment Setup

Install the required dependencies:

pip install -r requirements.txt

Inference

Navigate to the example directory and run inference on your audio files:

cd egs2/wsj0_2mix/enh1

python separate.py \
    --model_path ./exp/enh_train_enh_tisdiss_tflocoformer_en-residual_en1x2_re1x6_l1+1x6_raw/valid.loss.ave_5best.pth \
    --audio_path /path/to/input_audio \
    --audio_output_dir /path/to/output_directory

Parameters:

--model_path: Path to the pre-trained model checkpoint
--audio_path: Path to input audio file or directory
--audio_output_dir: Directory where separated audio will be saved

🔧 Training

1. Data Preparation

Navigate to the example directory:

cd egs2/wsj0_2mix/enh1

Note: You need to download the WSJ0 dataset separately (commercial license required).

Option A: WSJ0 in WAV Format

If your WSJ0 dataset is already in WAV format, create a symbolic link:

mkdir -p ./data/wsj0
ln -s /path/to/your/WSJ0 ./data/wsj0/wsj0

Alternatively, modify line 24 in ./local/data.sh to point to your WSJ0 path:

wsj_full_wav=/path/to/your/WSJ0/

Option B: WSJ0 in Original Format

If your dataset is in the original WSJ0 format:

Uncomment lines 76-81 in ./egs2/wsj0_2mix/enh1/local/data.sh
Fill in the WSJ0= path in db.sh

2. Preprocessing

Run data preparation and statistics collection:

./run.sh --stage 1 --stop_stage 5

3. Model Training

Train the TISDiSS model with the following command:

CUDA_VISIBLE_DEVICES=1 ./run.sh \
    --stage 6 \
    --stop_stage 6 \
    --enh_config conf/efficient_train/tisdiss/train_enh_tisdiss_tflocoformer_en-residual_en1x2_re1x6_l1+1x6.yaml \
    --ngpu 1

Training Configuration:

The model uses TF-Locoformer as the backbone
Training configuration: 2 Encoder blocks + 6 Reconstruction blocks
Adjust --ngpu to use multiple GPUs if available

4. Inference with Different Scalability Settings

Run inference with various Reconstruction block configurations (N_re):

./infer_run.sh

You can modify the script to test different N_re values:

for re in 3 6 8; do
    # Your inference commands here
done

📚 Citation

If you find this work useful in your research, please consider citing:

@article{feng2025tisdiss,
  title={TISDiSS: A Training-Time and Inference-Time Scalable Framework for Discriminative Source Separation},
  author={Feng, Yongsheng and Xu, Yuetonghui and Luo, Jiehui and Liu, Hongjia and Li, Xiaobing and Yu, Feng and Li, Wei},
  journal={arXiv preprint arXiv:2509.15666},
  year={2025}
}

Downloads last month: 4

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including WingsingFung/TISDiSS

SpeechSeparation

Collection

2 items • Updated 14 days ago