extend metadata (add language tag and link the publication)
Browse files
    	
        README.md
    CHANGED
    
    | @@ -5,8 +5,10 @@ tags: | |
| 5 | 
             
            model-index:
         | 
| 6 | 
             
            - name: pixel-base-german
         | 
| 7 | 
             
              results: []
         | 
| 8 | 
            -
              paper: https:// | 
| 9 | 
             
            license: apache-2.0
         | 
|  | |
|  | |
| 10 | 
             
            ---
         | 
| 11 |  | 
| 12 | 
             
            # PIXEL-base-german
         | 
| @@ -15,7 +17,7 @@ license: apache-2.0 | |
| 15 |  | 
| 16 | 
             
            We trained the model using the architecture and [codebase](https://github.com/xplip/pixel) proposed in the 2023 Rust et al. paper [Language Modelling with Pixels](https://arxiv.org/abs/2207.06991).
         | 
| 17 |  | 
| 18 | 
            -
            This model  | 
| 19 |  | 
| 20 | 
             
            ## Model description
         | 
| 21 | 
             
            *Description from [https://huggingface.co/Team-PIXEL/pixel-base](https://huggingface.co/Team-PIXEL/pixel-base)* 
         | 
| @@ -133,5 +135,22 @@ model = PIXELForPreTraining.from_pretrained("amunozo/pixel-base-german", config= | |
| 133 | 
             
            - Datasets 2.14.5
         | 
| 134 | 
             
            - Tokenizers 0.13.3
         | 
| 135 |  | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 136 | 
             
            ## Acknowledgements
         | 
| 137 | 
             
            This work was funded by the European Research Council (ERC)  Consolidator Grant DIALECT 101043235; SCANNER-UDC (PID2020-113230RB-C21) funded by MICIU/AEI/10.13039/501100011033; Xunta de Galicia (ED431C 2024/02); GAP (PID2022-139308OA-I00) funded by MICIU/AEI/10.13039/501100011033/ and by ERDF, EU; Grant PRE2021-097001 funded by MICIU/AEI/10.13039/501100011033 and by ESF+ (predoctoral training grant associated to project PID2020-113230RB-C21); LATCHING (PID2023-147129OB-C21) funded by MICIU/AEI/10.13039/501100011033 and ERDF; and Centro de Investigación de Galicia ‘‘CITIC’’, funded by the Xunta de Galicia through the collaboration agreement between the Consellería de Cultura, Educación, Formación Profesional e Universidades and the Galician universities for the reinforcement of the research centres of the Galician University System (CIGUS).
         | 
|  | |
| 5 | 
             
            model-index:
         | 
| 6 | 
             
            - name: pixel-base-german
         | 
| 7 | 
             
              results: []
         | 
| 8 | 
            +
              paper: https://aclanthology.org/2025.coling-main.427/
         | 
| 9 | 
             
            license: apache-2.0
         | 
| 10 | 
            +
            language:
         | 
| 11 | 
            +
            - de
         | 
| 12 | 
             
            ---
         | 
| 13 |  | 
| 14 | 
             
            # PIXEL-base-german
         | 
|  | |
| 17 |  | 
| 18 | 
             
            We trained the model using the architecture and [codebase](https://github.com/xplip/pixel) proposed in the 2023 Rust et al. paper [Language Modelling with Pixels](https://arxiv.org/abs/2207.06991).
         | 
| 19 |  | 
| 20 | 
            +
            This German model was introduced and evaluated in the paper [Evaluating Pixel Language Models on Non-Standardized Languages](https://aclanthology.org/2025.coling-main.427/), presented at COLING 2025.
         | 
| 21 |  | 
| 22 | 
             
            ## Model description
         | 
| 23 | 
             
            *Description from [https://huggingface.co/Team-PIXEL/pixel-base](https://huggingface.co/Team-PIXEL/pixel-base)* 
         | 
|  | |
| 135 | 
             
            - Datasets 2.14.5
         | 
| 136 | 
             
            - Tokenizers 0.13.3
         | 
| 137 |  | 
| 138 | 
            +
            ## Citation
         | 
| 139 | 
            +
             | 
| 140 | 
            +
            ```
         | 
| 141 | 
            +
            @inproceedings{munoz-ortiz-etal-2025-evaluating,
         | 
| 142 | 
            +
                title = "Evaluating Pixel Language Models on Non-Standardized Languages",
         | 
| 143 | 
            +
                author = "Mu{\~n}oz-Ortiz, Alberto and Blaschke, Verena and Plank, Barbara",
         | 
| 144 | 
            +
                editor = "Rambow, Owen and Wanner, Leo and Apidianaki, Marianna and Al-Khalifa, Hend and Eugenio, Barbara Di and Schockaert, Steven",
         | 
| 145 | 
            +
                booktitle = "Proceedings of the 31st International Conference on Computational Linguistics",
         | 
| 146 | 
            +
                month = jan,
         | 
| 147 | 
            +
                year = "2025",
         | 
| 148 | 
            +
                address = "Abu Dhabi, UAE",
         | 
| 149 | 
            +
                publisher = "Association for Computational Linguistics",
         | 
| 150 | 
            +
                url = "https://aclanthology.org/2025.coling-main.427/",
         | 
| 151 | 
            +
                pages = "6412--6419",
         | 
| 152 | 
            +
            }
         | 
| 153 | 
            +
            ```
         | 
| 154 | 
            +
             | 
| 155 | 
             
            ## Acknowledgements
         | 
| 156 | 
             
            This work was funded by the European Research Council (ERC)  Consolidator Grant DIALECT 101043235; SCANNER-UDC (PID2020-113230RB-C21) funded by MICIU/AEI/10.13039/501100011033; Xunta de Galicia (ED431C 2024/02); GAP (PID2022-139308OA-I00) funded by MICIU/AEI/10.13039/501100011033/ and by ERDF, EU; Grant PRE2021-097001 funded by MICIU/AEI/10.13039/501100011033 and by ESF+ (predoctoral training grant associated to project PID2020-113230RB-C21); LATCHING (PID2023-147129OB-C21) funded by MICIU/AEI/10.13039/501100011033 and ERDF; and Centro de Investigación de Galicia ‘‘CITIC’’, funded by the Xunta de Galicia through the collaboration agreement between the Consellería de Cultura, Educación, Formación Profesional e Universidades and the Galician universities for the reinforcement of the research centres of the Galician University System (CIGUS).
         | 
