transformers torch datasets pillow numpy sentencepiece