Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Common Crawl Foundation

Enterprise
non-profit
Verified
https://commoncrawl.org
commoncrawl
commoncrawl
Activity Feed

AI & ML interests

Crawled data and metadata

Recent Activity

pjox  authored a paper about 13 hours ago
SciLaD: A Large-Scale, Transparent, Reproducible Dataset for Natural Scientific Language Processing
tvaughan  updated a dataset 1 day ago
commoncrawl/statistics
malteos  updated a Space 16 days ago
commoncrawl/cc-citations
View all activity

Thom Vaughan's profile picture Pedro Ortiz Suarez's profile picture Paul Lazar's profile picture Greg Lindahl's profile picture Ford H's profile picture Jen English's profile picture Sebastian Nagel's profile picture Jason Grey's profile picture Laurie Burchell's profile picture Hande Celikkanat's profile picture malteos's profile picture Thijs Dalhuijsen's profile picture d's profile picture Luca's profile picture

commoncrawl 's datasets 5

commoncrawl/statistics

Viewer • Updated 1 day ago • 610k • 174 • 25

commoncrawl/gneissweb-annotation-host-testing-v1

Viewer • Updated Dec 11, 2025 • 617M • 21

commoncrawl/gneissweb-annotation-url-testing-v1

Viewer • Updated Dec 10, 2025 • 11.5B • 36

commoncrawl/citations

Viewer • Updated Oct 16, 2025 • 9.18k • 67 • 1

commoncrawl/eot2024_hostlevel_logs

Viewer • Updated Oct 9, 2024 • 271k • 7 • 1
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs