File size: 897 Bytes
9a7c45c 8f3ccae 3113c33 4f3da47 3113c33 4f3da47 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
---
datasets:
- flexthink/audiomnist
pipeline_tag: text-to-speech
---
This is a basic audio diffusion model using Unet. I've uploaded the weights and training code.
The sample method of the model is used to generate whatever spoken digit you want.
I used the awesome code provided by HuggingFace audio diffusers to generate Mel-spectrograms which were then used to train the model.
For the model code I used the denoising-diffusion-pytorch repo found at https://github.com/lucidrains/denoising-diffusion-pytorch
   
The images found in the files are sample{epoch}_{sample#}_{digit}.jpg. They also have corresponding audio files.
The audio is VERY quiet, so turn up the speakers to hear better. (Just don't forget to turn it down after!)
|