Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
bluelightai-dev 's Collections
Sampled Datasets

Sampled Datasets

updated Nov 11

Random samples from large datasets, for convenience.

Upvote
-

  • bluelightai-dev/dclm-full-deduped-sample

    Viewer • Updated Nov 11 • 4.92M • 6

  • bluelightai-dev/the-stack-dedup-sample

    Viewer • Updated Nov 10 • 474k • 20

  • bluelightai-dev/common-corpus-sample-open-culture

    Viewer • Updated Nov 11 • 462k • 2

  • bluelightai-dev/common-corpus-sample-open-government

    Viewer • Updated Nov 11 • 373k • 15 • 1

  • bluelightai-dev/common-corpus-sample-open-science

    Viewer • Updated Nov 11 • 284k • 7

  • bluelightai-dev/common-corpus-sample-open-source

    Viewer • Updated Nov 11 • 2.02M • 6

  • bluelightai-dev/common-corpus-sample-open-web

    Viewer • Updated Nov 11 • 4.8M • 56

  • bluelightai-dev/MathPile_Commercial-formatted

    Viewer • Updated Nov 12 • 389k • 23
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs