Generate speech from text using a reference voice
Convert audio to a different voice
Generate Japanese speech from text