--- title: README emoji: 🌍 colorFrom: gray colorTo: yellow sdk: static pinned: false license: other --- ivrit.ai is an effort to provide high-quality Hebrew datasets under a permissive license. It is our hope that such datasets will be used to enable first-class support for Hebrew in AI models. More about us can be found at ivrit.ai. We are proud to present our latest achievements: 1) A state-of-the-art Hebrew speech-to-text model: https://huggingface.co/ivrit-ai/whisper-large-v3-ct2 2) A turbo-based Hebrew speech-to-text model: https://huggingface.co/ivrit-ai/whisper-large-v3-turbo-ct2 3) Our newest comprehensive Hebrew language dataset: https://huggingface.co/datasets/ivrit-ai/crowd-transcribe-v5 Paper: https://www.isca-archive.org/interspeech_2025/marmor25_interspeech.pdf If you use our datasets or models, the following quote is preferable: ``` @inproceedings{marmor2025building, title={Building an Accurate Open-Source Hebrew ASR System through Crowdsourcing}, author={Marmor, Yanir and Lifshitz, Yair and Snapir, Yoad and Misgav, Kinneret}, booktitle={Proc. Interspeech 2025}, pages={723--727}, year={2025} } ```