not well with model_380000.pt using commonvoice data, seek help!!!
Hi, there.
Thank you for your model.
I have some question about this model. I tried with some data, it works not so well. So I think there maybe some error for my steps.
model : model_380000.pt
vocab: vocab.txt (from this hub)
ref text: ููุฐูุง ุณูุคูุงูู ุฌููููุฏู
ref audio:
gen text: ููุฐูุง ุณูุคูุงูู ุฌููููุฏู (same with the ref text)
I downloaded all these models, and using src/f5_tts/train/finetune_gradio.py do the inference.
And the final generated audio is: (ref audio was 24kHz)
When I tried to using the same audio with 16kHz, the result is better but not so good as the files in your readme. like this:
Could you help me with this or could you using model_380000.pt to do the same inference?
Thanks a lot.
try this notebook please : https://colab.research.google.com/drive/1kX7HB05CouHa5A-4Wy0UPqMuW4APqDBr?usp=sharing . and look at update 1 on the model card
Thank you very much.
May be the environment was not the same with yours.
I tried it in colab, it worked pretty well.
I will tried your repo to do the task.
Thanks again.