WenetSpeech-Yue: A Large-scale Cantonese Speech Corpus with Multi-dimensional Annotation

Longhao Li¹, Zhao Guo¹, Hongjie Chen², Yuhang Dai¹, Ziyu Zhang¹, Hongfei Xue¹, Tianlun Zuo¹, Chengyou Wang¹, Shuiyuan Wang¹, Xin Xu³, Hui Bu³, Jie Li², Jian Kang², Binbin Zhang⁴, Ruibin Yuan⁵, Ziya Zhou⁵, Wei Xue⁵, Lei Xie¹

¹ ASLP, Northwestern Polytechnical University, ² Institute of Artificial Intelligence (TeleAI), China Telecom, ³ Beijing AISHELL Technology Co., Ltd., ⁴ WeNet Open Source Community, ⁵ Hong Kong University of Science and Technology

📑 Paper    |    🐙 GitHub    |    🤗 HuggingFace
🖥️ HuggingFace Space    |    🎤 Demo Page    |    💬 Contact Us

ASR Leaderboard

Model #Params (M) In-House Open-Source WSYue-eval
Dialogue Reading yue HK MDCC Daily_Use Commands Short Long
w/o LLM
Conformer-Yue⭐13016.577.827.7211.425.735.738.975.058.89
Paraformer22083.2251.9770.1668.4947.6779.3169.3273.6489.00
SenseVoice-small23421.086.528.057.346.345.746.656.699.95
SenseVoice-s-Yue⭐23419.196.716.878.685.435.246.935.238.63
Dolphin-small37259.207.3839.6951.2926.397.219.6832.3258.20
TeleASR70037.187.277.027.886.258.025.986.2311.33
Whisper-medium76975.5068.6959.4462.5062.3164.4180.4180.8250.96
Whisper-m-Yue⭐76918.696.866.8611.035.494.708.515.058.05
FireRedASR-AED-L110073.7018.7243.9343.3334.5348.0549.9955.3750.26
Whisper-large-v3155045.0915.4612.8516.3614.6317.8420.7012.9526.86
w/ LLM
Qwen2.5-Omni-3B300072.017.4912.5911.7538.9110.5925.7867.9588.46
Kimi-Audio700068.6524.3440.9038.7230.7244.2945.5450.8633.49
FireRedASR-LLM-L830073.7018.7243.9343.3334.5348.0549.9949.8745.92
Conformer-LLM-Yue⭐420017.226.216.239.524.354.576.984.737.91

ASR Inference

U2pp_Conformer_Yue

dir=u2pp_conformer_yue
decode_checkpoint=$dir/u2pp_conformer_yue.pt
test_set=path/to/test_set
test_result_dir=path/to/test_result_dir

python wenet/bin/recognize.py \
  --gpu 0 \
  --modes attention_rescoring \
  --config $dir/train.yaml \
  --test_data $test_set/data.list \
  --checkpoint $decode_checkpoint \
  --beam_size 10 \
  --batch_size 32 \
  --ctc_weight 0.5 \
  --result_dir $test_result_dir \
  --decoding_chunk_size -1

Whisper_Medium_Yue

dir=whisper_medium_yue
decode_checkpoint=$dir/whisper_medium_yue.pt
test_set=path/to/test_set
test_result_dir=path/to/test_result_dir

python wenet/bin/recognize.py \
  --gpu 0 \
  --modes attention \
  --config $dir/train.yaml \
  --test_data $test_set/data.list \
  --checkpoint $decode_checkpoint \
  --beam_size 10 \
  --batch_size 32 \
  --blank_penalty 0.0 \
  --ctc_weight 0.0 \
  --reverse_weight 0.0 \
  --result_dir $test_result_dir \
  --decoding_chunk_size -1

SenseVoice_Small_Yue

from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess

model_dir = "sensevoice_small_yue"

model = AutoModel(
        model=model_path,
        device="cuda:0",
    )
res = model.generate(
    wav_path,
    cache={},
    language="yue",
    use_itn=True,
    batch_size=64,
)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including ASLP-lab/WSYue-ASR