FineData

community

AI & ML interests

We release large pre-training datasets to accelerate open LLM development. Part of the Hugging Face Science team (hf.co/science)

Recent Activity

joelniklaus updated a dataset 2 days ago

HuggingFaceFW/finephrase

joelniklaus new activity 7 days ago

HuggingFaceFW/finephrase:Intrinsic quality evaluation of 3000 examples using LLM-as-judge

joelniklaus updated a Space 14 days ago

HuggingFaceFW/finephrase

View all activity

Papers

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

View all Papers

HuggingFaceFW 's datasets 35

HuggingFaceFW/finephrase

Viewer • Updated 2 days ago • 1.02B • 528k • 90

HuggingFaceFW/finepdfs_edu_50BT-dclm_30BT-fineweb_edu_20BT-shuffled

Viewer • Updated about 1 month ago • 56.1M • 894

HuggingFaceFW/finepdfs_edu_50BT-dclm_30BT-fineweb_edu_20BT

Viewer • Updated about 1 month ago • 56.1M • 11.1k

HuggingFaceFW/finepdfs_50BT-dclm_30BT-fineweb_edu_20BT-shuffled

Viewer • Updated about 1 month ago • 62.1M • 1.34k • 3

HuggingFaceFW/finepdfs_50BT-dclm_30BT-fineweb_edu_20BT

Viewer • Updated about 1 month ago • 62.1M • 15.2k • 1

HuggingFaceFW/finepdfs_edu_100BT-shuffled

Viewer • Updated about 1 month ago • 17.8M • 1.68k

HuggingFaceFW/finepdfs_edu_100BT

Viewer • Updated about 1 month ago • 17.8M • 1.9k

HuggingFaceFW/finepdfs_100BT-shuffled

Viewer • Updated about 1 month ago • 14.6M • 558

HuggingFaceFW/finepdfs_100BT

Viewer • Updated about 1 month ago • 29.9M • 1.95k

HuggingFaceFW/fineweb_edu_100BT-shuffled

Viewer • Updated about 1 month ago • 102M • 600

HuggingFaceFW/fineweb_edu_100BT

Preview • Updated about 1 month ago • 523

HuggingFaceFW/fineweb_100BT-shuffled

Viewer • Updated about 1 month ago • 161M • 450

HuggingFaceFW/fineweb_100BT

Viewer • Updated about 1 month ago • 161M • 956 • 1

HuggingFaceFW/dclm_100BT-shuffled

Viewer • Updated about 1 month ago • 89.3M • 2.33k • 1

HuggingFaceFW/dclm_100BT

Viewer • Updated about 1 month ago • 89.3M • 661

HuggingFaceFW/finetranslations-edu

Viewer • Updated Jan 9 • 109M • 1.49k • 26

HuggingFaceFW/finetranslations

Viewer • Updated Jan 9 • 3.33B • 36.7k • 277

HuggingFaceFW/admin

Viewer • Updated Jan 9 • 18 • 15.3k • 3

HuggingFaceFW/finepdfs

Viewer • Updated Jan 9 • 476M • 35.8k • 833

HuggingFaceFW/CommonsenseQA

Viewer • Updated Dec 30, 2025 • 1k • 33 • 1

HuggingFaceFW/MMLU-Redux-2.0-Generative

Viewer • Updated Dec 30, 2025 • 5.43k • 1.21k • 2

HuggingFaceFW/ARC-Generative

Viewer • Updated Dec 30, 2025 • 7.79k • 72

HuggingFaceFW/finepdfs-edu

Viewer • Updated Nov 11, 2025 • 49.5M • 6.52k • 85

HuggingFaceFW/fineweb-2

Viewer • Updated Oct 27, 2025 • 4.48B • 38.4k • 775

HuggingFaceFW/finewiki

Viewer • Updated Oct 22, 2025 • 61.6M • 8.68k • 288

HuggingFaceFW/clean-wikipedia

Viewer • Updated Oct 21, 2025 • 61.2M • 1.53k • 24

HuggingFaceFW/finepdfs_lang_classification_tmp

Updated Oct 21, 2025 • 10

HuggingFaceFW/ocr-annotations

Viewer • Updated Oct 20, 2025 • 1.62k • 141 • 17

HuggingFaceFW/finepdfs_lang_classification

Viewer • Updated Oct 17, 2025 • 3.08M • 22.6k • 4

HuggingFaceFW/finepdfs_eng_Latn_labeled

Viewer • Updated Oct 6, 2025 • 1.3M • 170 • 3