Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
bluelightai-dev
's Collections
Sampled Datasets
Sampled Datasets
updated
Nov 11
Random samples from large datasets, for convenience.
Upvote
-
bluelightai-dev/dclm-full-deduped-sample
Viewer
•
Updated
Nov 11
•
4.92M
•
6
bluelightai-dev/the-stack-dedup-sample
Viewer
•
Updated
Nov 10
•
474k
•
20
bluelightai-dev/common-corpus-sample-open-culture
Viewer
•
Updated
Nov 11
•
462k
•
2
bluelightai-dev/common-corpus-sample-open-government
Viewer
•
Updated
Nov 11
•
373k
•
15
•
1
bluelightai-dev/common-corpus-sample-open-science
Viewer
•
Updated
Nov 11
•
284k
•
7
bluelightai-dev/common-corpus-sample-open-source
Viewer
•
Updated
Nov 11
•
2.02M
•
6
bluelightai-dev/common-corpus-sample-open-web
Viewer
•
Updated
Nov 11
•
4.8M
•
56
bluelightai-dev/MathPile_Commercial-formatted
Viewer
•
Updated
Nov 12
•
389k
•
23
Upvote
-
Share collection
View history
Collection guide
Browse collections