--- license: mit tags: - tokenizer - sentencepiece - monolingual - snd - vocab-128000 --- # Monolingual Tokenizer - Sindhi (Vocab 128000) This is a monolingual tokenizer trained on Sindhi text with vocabulary size 128000. ## Usage ```python from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("monolingual-tokenizer-native-snd-vocab-128000") ``` ## Files - `snd.model`: SentencePiece model file - `snd.vocab`: Vocabulary file - `config.json`: Tokenizer configuration ## Training Details - Language: Sindhi (snd) - Vocabulary Size: 128000 - Model Type: SentencePiece Unigram