amunozo verenablaschke commited on
Commit
efd7df5
·
verified ·
1 Parent(s): fe5104c

extend metadata (add language tag and link the publication) (#1)

Browse files

- extend metadata (add language tag and link the publication) (638763247e86f09f068701ce6480812c5aa13926)


Co-authored-by: Verena Blaschke <verenablaschke@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +21 -2
README.md CHANGED
@@ -5,8 +5,10 @@ tags:
5
  model-index:
6
  - name: pixel-base-german
7
  results: []
8
- paper: https://arxiv.org/abs/xxxx.xxxxx
9
  license: apache-2.0
 
 
10
  ---
11
 
12
  # PIXEL-base-german
@@ -15,7 +17,7 @@ license: apache-2.0
15
 
16
  We trained the model using the architecture and [codebase](https://github.com/xplip/pixel) proposed in the 2023 Rust et al. paper [Language Modelling with Pixels](https://arxiv.org/abs/2207.06991).
17
 
18
- This model has been used in the paper *Evaluating Pixel Language Models on Non-Standardized Languages*, accepted at COLING 2025.
19
 
20
  ## Model description
21
  *Description from [https://huggingface.co/Team-PIXEL/pixel-base](https://huggingface.co/Team-PIXEL/pixel-base)*
@@ -133,5 +135,22 @@ model = PIXELForPreTraining.from_pretrained("amunozo/pixel-base-german", config=
133
  - Datasets 2.14.5
134
  - Tokenizers 0.13.3
135
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
136
  ## Acknowledgements
137
  This work was funded by the European Research Council (ERC) Consolidator Grant DIALECT 101043235; SCANNER-UDC (PID2020-113230RB-C21) funded by MICIU/AEI/10.13039/501100011033; Xunta de Galicia (ED431C 2024/02); GAP (PID2022-139308OA-I00) funded by MICIU/AEI/10.13039/501100011033/ and by ERDF, EU; Grant PRE2021-097001 funded by MICIU/AEI/10.13039/501100011033 and by ESF+ (predoctoral training grant associated to project PID2020-113230RB-C21); LATCHING (PID2023-147129OB-C21) funded by MICIU/AEI/10.13039/501100011033 and ERDF; and Centro de Investigación de Galicia ‘‘CITIC’’, funded by the Xunta de Galicia through the collaboration agreement between the Consellería de Cultura, Educación, Formación Profesional e Universidades and the Galician universities for the reinforcement of the research centres of the Galician University System (CIGUS).
 
5
  model-index:
6
  - name: pixel-base-german
7
  results: []
8
+ paper: https://aclanthology.org/2025.coling-main.427/
9
  license: apache-2.0
10
+ language:
11
+ - de
12
  ---
13
 
14
  # PIXEL-base-german
 
17
 
18
  We trained the model using the architecture and [codebase](https://github.com/xplip/pixel) proposed in the 2023 Rust et al. paper [Language Modelling with Pixels](https://arxiv.org/abs/2207.06991).
19
 
20
+ This German model was introduced and evaluated in the paper [Evaluating Pixel Language Models on Non-Standardized Languages](https://aclanthology.org/2025.coling-main.427/), presented at COLING 2025.
21
 
22
  ## Model description
23
  *Description from [https://huggingface.co/Team-PIXEL/pixel-base](https://huggingface.co/Team-PIXEL/pixel-base)*
 
135
  - Datasets 2.14.5
136
  - Tokenizers 0.13.3
137
 
138
+ ## Citation
139
+
140
+ ```
141
+ @inproceedings{munoz-ortiz-etal-2025-evaluating,
142
+ title = "Evaluating Pixel Language Models on Non-Standardized Languages",
143
+ author = "Mu{\~n}oz-Ortiz, Alberto and Blaschke, Verena and Plank, Barbara",
144
+ editor = "Rambow, Owen and Wanner, Leo and Apidianaki, Marianna and Al-Khalifa, Hend and Eugenio, Barbara Di and Schockaert, Steven",
145
+ booktitle = "Proceedings of the 31st International Conference on Computational Linguistics",
146
+ month = jan,
147
+ year = "2025",
148
+ address = "Abu Dhabi, UAE",
149
+ publisher = "Association for Computational Linguistics",
150
+ url = "https://aclanthology.org/2025.coling-main.427/",
151
+ pages = "6412--6419",
152
+ }
153
+ ```
154
+
155
  ## Acknowledgements
156
  This work was funded by the European Research Council (ERC) Consolidator Grant DIALECT 101043235; SCANNER-UDC (PID2020-113230RB-C21) funded by MICIU/AEI/10.13039/501100011033; Xunta de Galicia (ED431C 2024/02); GAP (PID2022-139308OA-I00) funded by MICIU/AEI/10.13039/501100011033/ and by ERDF, EU; Grant PRE2021-097001 funded by MICIU/AEI/10.13039/501100011033 and by ESF+ (predoctoral training grant associated to project PID2020-113230RB-C21); LATCHING (PID2023-147129OB-C21) funded by MICIU/AEI/10.13039/501100011033 and ERDF; and Centro de Investigación de Galicia ‘‘CITIC’’, funded by the Xunta de Galicia through the collaboration agreement between the Consellería de Cultura, Educación, Formación Profesional e Universidades and the Galician universities for the reinforcement of the research centres of the Galician University System (CIGUS).