Safetensors
Basque
llama
GorkaUrbizu commited on
Commit
1785b42
·
verified ·
1 Parent(s): b02b8f1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -4
README.md CHANGED
@@ -1,10 +1,43 @@
1
  ---
2
  license: llama3.2
 
 
 
3
  language:
4
  - eu
5
  base_model:
6
  - meta-llama/Llama-3.2-1B
7
- datasets:
8
- - orai-nlp/ZelaiHandi
9
- - HuggingFaceFW/fineweb
10
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: llama3.2
3
+ datasets:
4
+ - orai-nlp/ZelaiHandi
5
+ - HuggingFaceFW/fineweb
6
  language:
7
  - eu
8
  base_model:
9
  - meta-llama/Llama-3.2-1B
10
+ ---
11
+
12
+ # Llama3.2-1B-eu continual
13
+
14
+ Llama-3.2 1B for Basque continually pretrained on [ZelaHandi-v1](https://huggingface.co/datasets/orai-nlp/ZelaiHandi) for 5 epochs.
15
+
16
+ 📝 Paper: [Sub-1B Language Models for Low-Resource Languages: Training Strategies and Insights for Basque](https://aclanthology.org/2025.mrl-main.35/) accepted in the [5TH MULTILINGUAL REPRESENTATION LEARNING (MRL) WORKSHOP 2025](https://sigtyp.github.io/ws2025-mrl.html) (EMNLP)
17
+
18
+
19
+ ## Acknowledgments
20
+
21
+ The creation of this dataset has been partially funded by the Basque Government (ICL4LANG project, grant no. KK-2023/00094) and the European Union (EFA 104/01-LINGUATEC IA project, INTERREG POCTEFA 2021-2027 program).
22
+ Pre-training and fine-tuning of SLMs were conducted using the Hyperion system at the Donostia International Physics Center (DIPC).
23
+ Finally, we thank Idoia Davila Uzkudun for her contributions to manual data curation and evaluation.
24
+
25
+ ## Citation
26
+
27
+ If you use this dataset please cite the following paper:
28
+
29
+ ```bibtex
30
+ @inproceedings{urbizu2025sub,
31
+ title={Sub-1B Language Models for Low-Resource Languages: Training Strategies and Insights for {B}asque},
32
+ author={Urbizu, Gorka and Corral, Ander and Saralegi, Xabier and San Vicente, I{\~n}aki},
33
+ booktitle={Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025)},
34
+ pages={519--530},
35
+ year={2025}
36
+ }
37
+
38
+ ```
39
+
40
+ ## Contact
41
+
42
+ - Gorka Urbizu (g.urbizu@orai.eus)
43
+ - Xabier Saralegi (x.saralegi@orai.eus)