File size: 335 Bytes
4633b94 790a0d5 4633b94 790a0d5 4633b94 e2f37eb |
1 2 3 4 5 6 7 8 |
# t5-small-wikitext
t5-small trained on [wikitext/wikitest-103-raw-v1](wikitext/wikitest-103-raw-v1) over 50k steps (around 2 hours of training) following [T5 paper](https://arxiv.org/pdf/1910.10683.pdf) training procedure.
* batch_size: 32
* max_seq_length: 128
* optim: Adafactor
* sheduler: inverse square root (10k warm-up steps) |