Additional pre-trained voices
Nice Demo NihalGazi
I assume you using OpenAI Text-to-Speech (TTS) API?
- OpenAI’s API does not seem to support an emotion parameter (emotion=ENCODED_EMOTION), What TTS API endpoint are you using?
- where are you getting the additional pre-trained voices?
like: "coral", "verse", "ballad", "ash", "sage", "amuch", "dan" ?
Are you using any additional open-source TTS projects (like Bark, Coqui TTS, or custom Hugging Face Space) ?
How does OpenAI Text-to-Speech (TTS) API know how to deal with these additionally selected voices?
Nice Demo NihalGazi
I assume you using OpenAI Text-to-Speech (TTS) API, but where are you getting the additional pre-trained voices?
like: "coral", "verse", "ballad", "ash", "sage", "amuch", "dan" ?
Are you using any additional open-source TTS projects (like Bark, Coqui TTS, or custom Hugging Face Space) ?
How does OpenAI Text-to-Speech (TTS) API know how to deal with these additionally selected voices?
Hey there! Thanks for the feedback.
Beyond OpenAI's default TTS model, there is another smaller set of models called the gpt-4o-mini-tts
, which has additional voices that are much more customisable than the default voices.
wait, how am i suppose to make my OWN voice? 🤨
wait, how am i suppose to make my OWN voice? 🤨
You can't. This isn't actually "voice cloning". It's "Text-to-speech". For voice cloning, you can check out XTTS, F5-TTS or CHATTERBOX TTS.
I'll try to implement voice cloning ASAP.