not well with model_380000.pt using commonvoice data, seek help!!!

#10
by mushouxiaoer - opened

Hi, there.
Thank you for your model.

I have some question about this model. I tried with some data, it works not so well. So I think there maybe some error for my steps.

model : model_380000.pt
vocab: vocab.txt (from this hub)
ref text: ู‡ูŽุฐูŽุง ุณูุคูŽุงู„ูŒ ุฌูŽูŠู‘ูุฏูŒ
ref audio:

gen text: ู‡ูŽุฐูŽุง ุณูุคูŽุงู„ูŒ ุฌูŽูŠู‘ูุฏูŒ (same with the ref text)

I downloaded all these models, and using src/f5_tts/train/finetune_gradio.py do the inference.

And the final generated audio is: (ref audio was 24kHz)

When I tried to using the same audio with 16kHz, the result is better but not so good as the files in your readme. like this:

Could you help me with this or could you using model_380000.pt to do the same inference?

Thanks a lot.

try this notebook please : https://colab.research.google.com/drive/1kX7HB05CouHa5A-4Wy0UPqMuW4APqDBr?usp=sharing . and look at update 1 on the model card

Thank you very much.

May be the environment was not the same with yours.

I tried it in colab, it worked pretty well.

I will tried your repo to do the task.

Thanks again.

mushouxiaoer changed discussion status to closed

Sign up or log in to comment