not well with model_380000.pt using commonvoice data, seek help!!!

#10

by mushouxiaoer - opened Jun 11

Jun 11

Hi, there.
Thank you for your model.

I have some question about this model. I tried with some data, it works not so well. So I think there maybe some error for my steps.

model : model_380000.pt
vocab: vocab.txt (from this hub)
ref text: هَذَا سُؤَالٌ جَيِّدٌ
ref audio:

gen text: هَذَا سُؤَالٌ جَيِّدٌ (same with the ref text)

I downloaded all these models, and using src/f5_tts/train/finetune_gradio.py do the inference.

And the final generated audio is: (ref audio was 24kHz)

When I tried to using the same audio with 16kHz, the result is better but not so good as the files in your readme. like this:

Could you help me with this or could you using model_380000.pt to do the same inference?

Thanks a lot.

Owner Jun 11

Jun 13

Thank you very much.

May be the environment was not the same with yours.

I tried it in colab, it worked pretty well.

I will tried your repo to do the task.

Thanks again.

mushouxiaoer changed discussion status to closed Jun 13

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment