extend metadata (add language tag and link the publication) (#1)
Browse files- extend metadata (add language tag and link the publication) (638763247e86f09f068701ce6480812c5aa13926)
Co-authored-by: Verena Blaschke <verenablaschke@users.noreply.huggingface.co>
README.md
CHANGED
|
@@ -5,8 +5,10 @@ tags:
|
|
| 5 |
model-index:
|
| 6 |
- name: pixel-base-german
|
| 7 |
results: []
|
| 8 |
-
paper: https://
|
| 9 |
license: apache-2.0
|
|
|
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
# PIXEL-base-german
|
|
@@ -15,7 +17,7 @@ license: apache-2.0
|
|
| 15 |
|
| 16 |
We trained the model using the architecture and [codebase](https://github.com/xplip/pixel) proposed in the 2023 Rust et al. paper [Language Modelling with Pixels](https://arxiv.org/abs/2207.06991).
|
| 17 |
|
| 18 |
-
This model
|
| 19 |
|
| 20 |
## Model description
|
| 21 |
*Description from [https://huggingface.co/Team-PIXEL/pixel-base](https://huggingface.co/Team-PIXEL/pixel-base)*
|
|
@@ -133,5 +135,22 @@ model = PIXELForPreTraining.from_pretrained("amunozo/pixel-base-german", config=
|
|
| 133 |
- Datasets 2.14.5
|
| 134 |
- Tokenizers 0.13.3
|
| 135 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 136 |
## Acknowledgements
|
| 137 |
This work was funded by the European Research Council (ERC) Consolidator Grant DIALECT 101043235; SCANNER-UDC (PID2020-113230RB-C21) funded by MICIU/AEI/10.13039/501100011033; Xunta de Galicia (ED431C 2024/02); GAP (PID2022-139308OA-I00) funded by MICIU/AEI/10.13039/501100011033/ and by ERDF, EU; Grant PRE2021-097001 funded by MICIU/AEI/10.13039/501100011033 and by ESF+ (predoctoral training grant associated to project PID2020-113230RB-C21); LATCHING (PID2023-147129OB-C21) funded by MICIU/AEI/10.13039/501100011033 and ERDF; and Centro de Investigación de Galicia ‘‘CITIC’’, funded by the Xunta de Galicia through the collaboration agreement between the Consellería de Cultura, Educación, Formación Profesional e Universidades and the Galician universities for the reinforcement of the research centres of the Galician University System (CIGUS).
|
|
|
|
| 5 |
model-index:
|
| 6 |
- name: pixel-base-german
|
| 7 |
results: []
|
| 8 |
+
paper: https://aclanthology.org/2025.coling-main.427/
|
| 9 |
license: apache-2.0
|
| 10 |
+
language:
|
| 11 |
+
- de
|
| 12 |
---
|
| 13 |
|
| 14 |
# PIXEL-base-german
|
|
|
|
| 17 |
|
| 18 |
We trained the model using the architecture and [codebase](https://github.com/xplip/pixel) proposed in the 2023 Rust et al. paper [Language Modelling with Pixels](https://arxiv.org/abs/2207.06991).
|
| 19 |
|
| 20 |
+
This German model was introduced and evaluated in the paper [Evaluating Pixel Language Models on Non-Standardized Languages](https://aclanthology.org/2025.coling-main.427/), presented at COLING 2025.
|
| 21 |
|
| 22 |
## Model description
|
| 23 |
*Description from [https://huggingface.co/Team-PIXEL/pixel-base](https://huggingface.co/Team-PIXEL/pixel-base)*
|
|
|
|
| 135 |
- Datasets 2.14.5
|
| 136 |
- Tokenizers 0.13.3
|
| 137 |
|
| 138 |
+
## Citation
|
| 139 |
+
|
| 140 |
+
```
|
| 141 |
+
@inproceedings{munoz-ortiz-etal-2025-evaluating,
|
| 142 |
+
title = "Evaluating Pixel Language Models on Non-Standardized Languages",
|
| 143 |
+
author = "Mu{\~n}oz-Ortiz, Alberto and Blaschke, Verena and Plank, Barbara",
|
| 144 |
+
editor = "Rambow, Owen and Wanner, Leo and Apidianaki, Marianna and Al-Khalifa, Hend and Eugenio, Barbara Di and Schockaert, Steven",
|
| 145 |
+
booktitle = "Proceedings of the 31st International Conference on Computational Linguistics",
|
| 146 |
+
month = jan,
|
| 147 |
+
year = "2025",
|
| 148 |
+
address = "Abu Dhabi, UAE",
|
| 149 |
+
publisher = "Association for Computational Linguistics",
|
| 150 |
+
url = "https://aclanthology.org/2025.coling-main.427/",
|
| 151 |
+
pages = "6412--6419",
|
| 152 |
+
}
|
| 153 |
+
```
|
| 154 |
+
|
| 155 |
## Acknowledgements
|
| 156 |
This work was funded by the European Research Council (ERC) Consolidator Grant DIALECT 101043235; SCANNER-UDC (PID2020-113230RB-C21) funded by MICIU/AEI/10.13039/501100011033; Xunta de Galicia (ED431C 2024/02); GAP (PID2022-139308OA-I00) funded by MICIU/AEI/10.13039/501100011033/ and by ERDF, EU; Grant PRE2021-097001 funded by MICIU/AEI/10.13039/501100011033 and by ESF+ (predoctoral training grant associated to project PID2020-113230RB-C21); LATCHING (PID2023-147129OB-C21) funded by MICIU/AEI/10.13039/501100011033 and ERDF; and Centro de Investigación de Galicia ‘‘CITIC’’, funded by the Xunta de Galicia through the collaboration agreement between the Consellería de Cultura, Educación, Formación Profesional e Universidades and the Galician universities for the reinforcement of the research centres of the Galician University System (CIGUS).
|