| # Beyond Document Page Classification |
|
|
| We release the benchmarking code together with the proposed datasets: |
|
|
| * https://huggingface.co/datasets/bdpc/rvl_cdip_mp |
| * https://huggingface.co/datasets/bdpc/rvl_cdip_n_mp |
| |
| For consistency, we add it as an anonymous model repository (can be cloned) in HuggingFace. |
| |
| ## Installation |
| |
| The scripts require [python >= 3.8](https://www.python.org/downloads/release/python-380/) to run. |
| We will create a fresh virtualenvironment in which to install all required packages. |
| ```sh |
| mkvirtualenv -p /usr/bin/python3 BYD |
| ``` |
| |
| Using poetry and the readily defined pyproject.toml, we will install all required packages |
| ```sh |
| workon BYD |
| pip3 install poetry |
| poetry install |
| ``` |
| |
| ## Experiments |
| |
| To replicate all experiment results from the paper, run experiments.sh |
| |
| ```sh |
| ./experiments.sh |
| ``` |
| |