Update README.md
Browse files
README.md
CHANGED
|
@@ -23,7 +23,7 @@ Hash token embeddings:
|
|
| 23 |
|
| 24 |
These models are pre-trained on the same training corpus as BERT (with a copy of Wikipedia from 2025) as recommended in the paper [Well-Read Students Learn Better: On the Importance of Pre-training Compact Models](https://arxiv.org/abs/1908.08962).
|
| 25 |
|
| 26 |
-
|
| 27 |
|
| 28 |
```bash
|
| 29 |
python run_glue.py --model_name_or_path <model path> --task_name <task name> --do_train --do_eval --max_seq_length 128 --per_device_train_batch_size 32 --learning_rate 1e-4 --num_train_epochs 4 --output_dir outputs --trust-remote-code True
|
|
|
|
| 23 |
|
| 24 |
These models are pre-trained on the same training corpus as BERT (with a copy of Wikipedia from 2025) as recommended in the paper [Well-Read Students Learn Better: On the Importance of Pre-training Compact Models](https://arxiv.org/abs/1908.08962).
|
| 25 |
|
| 26 |
+
Below is a subset of GLUE scores on the dev set using the [script provided by Hugging Face Transformers](https://github.com/huggingface/transformers/blob/main/examples/pytorch/text-classification/run_glue.py) with the following parameters.
|
| 27 |
|
| 28 |
```bash
|
| 29 |
python run_glue.py --model_name_or_path <model path> --task_name <task name> --do_train --do_eval --max_seq_length 128 --per_device_train_batch_size 32 --learning_rate 1e-4 --num_train_epochs 4 --output_dir outputs --trust-remote-code True
|