carted-ml's picture
Create README.md
3a74166
|
raw
history blame
245 Bytes

This is a Unigram tokenizer trained on the Wikitext dataset. Refer to the train_unigram.py script within this repository to know how it was trained.