Needed something small for batched short-form. Low effort v0.3 distillation into base with tokenizer swap and 128 mel bins. Not recommended for long form, dr worse while sr/ir improved, slightly better wer but skips too much text. Short form between small-medium and a little better than moonshine, see table below.

Benchmarks

v0.3 for more

cv8-filtered fixes leakage wrt cv20 2178/4483

fleurs cv8 cv8-filtered jsut-basic reazon bluearchive nekopara
cer sr ir dr cer sr ir dr cer sr ir dr cer sr ir dr cer sr ir dr cer sr ir dr cer sr ir dr
tiny_b5_n5 (37.8M) 34.872 25.283 7.106 2.482 34.412 24.491 7.392 2.529 36.807 25.709 8.525 2.573 31.475 23.028 6.754 1.693 53.727 26.712 20.666 6.348 32.597 17.671 12.559 2.366 60.719 32.518 21.223 6.978
tiny_b5_n5_nt 34.645 25.366 6.818 2.461 33.071 23.777 6.459 2.835 34.886 24.820 7.166 2.900 31.048 22.804 6.418 1.826 46.508 24.779 13.174 8.555 26.572 16.250 7.372 2.950 51.144 29.551 11.354 10.239
base_b5_n5 (73.6M) 22.811 16.903 3.855 2.053 23.723 17.283 4.268 2.173 25.653 18.559 4.760 2.334 22.426 16.994 4.012 1.420 40.646 18.565 16.556 5.526 17.958 11.096 4.815 2.046 44.135 26.304 10.779 7.052
base_b5_n5_nt 22.758 16.854 3.784 2.120 22.991 16.836 3.728 2.426 24.669 17.882 4.212 2.575 22.338 16.910 3.952 1.476 35.135 17.636 11.435 6.065 16.536 10.361 3.516 2.658 40.438 24.289 6.848 9.301
small_b5_n5 (242M) 12.177 9.052 1.707 1.419 14.460 10.320 2.276 1.865 15.731 11.007 2.758 1.966 13.856 10.593 2.114 1.149 28.569 11.154 13.162 4.253 10.923 6.613 2.718 1.593 31.519 18.896 5.427 7.196
small_b5_n5_nt 12.061 8.997 1.587 1.477 14.099 10.214 1.906 1.978 15.088 10.836 2.153 2.099 13.765 10.541 2.032 1.192 25.462 10.836 9.391 5.235 10.038 6.103 1.949 1.986 29.738 17.271 3.001 9.466
medium_b5_n5(764M) 7.259 5.218 0.959 1.082 10.925 7.694 1.455 1.776 11.869 8.344 1.555 1.970 9.628 7.200 1.428 1.000 25.426 7.871 13.965 3.590 8.756 4.996 2.415 1.345 29.385 16.406 5.659 7.320
medium_b5_n5_nt 7.244 5.240 0.910 1.094 10.647 7.506 1.282 1.859 11.474 8.055 1.332 2.088 9.598 7.200 1.349 1.048 21.381 7.791 9.150 4.440 8.306 4.700 1.684 1.922 27.300 14.658 2.833 9.808
moonshine_b1 (27M) 13.149 9.129 1.872 2.148 9.685* 6.693* 1.375* 1.618* 11.664* 8.046* 1.751* 1.867* 10.570 7.694 1.901 0.975 10.410* 4.786* 2.745* 2.879* 17.938 7.170 2.564 8.204 43.224 17.754 6.988 18.482
moonshine_b5 11.280 7.786 1.646 1.848 8.575* 5.916* 1.158* 1.501* 10.281* 7.064* 1.514* 1.702* 9.876 7.143 1.827 0.906 9.561* 4.218* 2.631* 2.713* 16.956 6.788 2.496 7.671 44.377 17.648 9.406 17.323
moonshine_b5_n5 12.839 8.825 1.927 2.087 9.037* 6.230* 1.205* 1.603* 10.745* 7.374* 1.553* 1.819* 10.184 7.409 1.849 0.926 10.301* 4.614* 2.697* 2.989* 16.585 6.859 1.937 7.788 38.240 16.535 3.036 18.668
base-ja_b5 (56.6M) 11.102 7.701 1.480 1.921 10.704* 7.403* 1.639* 1.661* 13.084 8.890 2.237 1.957 9.283 6.964 1.233 1.087 18.041 7.054 7.272 3.715 8.586 4.969 2.055 1.562 22.512 12.687 3.312 6.513
base-ja_b5_n5 11.145 7.734 1.480 1.931 10.692* 7.407* 1.622* 1.664* 13.057 8.897 2.201 1.959 9.292 6.972 1.231 1.088 17.929 7.049 7.159 3.721 8.591 4.978 2.058 1.555 22.491 12.656 3.295 6.541
base-ja_b5_nt 10.379 7.354 1.474 1.551 10.668* 7.391* 1.598* 1.679* 13.054 8.911 2.160 1.984 9.259 6.959 1.223 1.077 14.905 6.257 4.711 3.938 8.338 4.810 1.903 1.625 22.226 12.473 2.946 6.806
base-ja_b5_n5_nt 10.416 7.385 1.474 1.557 10.672* 7.387* 1.597* 1.687* 13.061 8.906 2.158 1.997 9.272 6.971 1.223 1.079 14.931 6.254 4.733 3.944 8.340 4.813 1.902 1.625 22.216 12.466 2.937 6.813

Acknowledgements

  • Train sets: OOPPEENN, Reazon, 小虫哥_, Common Voice 20, deepghs
  • Test sets: kotoba-tech, Saruwatari-lab, grider-withourai
Downloads last month
11
Safetensors
Model size
56.6M params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support