Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
anujga
's Collections
RL2
RecSys
rl-papers
Multi-lingual
Retrieval
Special
Aggregates
PT
Persona
Pt-classify
Sft
O1
Rl
Programming
Benchmark
Architecture
Datasets
Theory
agent
data/tool
data/vision
chemistry
PT
updated
Jun 24
Upvote
-
allenai/peS2o
Updated
Oct 13, 2024
•
4.32k
•
185
allenai/dolmino-mix-1124
Viewer
•
Updated
Oct 29
•
170M
•
48.1k
•
88
allenai/olmo-mix-1124
Viewer
•
Updated
Aug 19
•
621M
•
36.3k
•
84
Locutusque/UltraTextbooks
Viewer
•
Updated
Feb 2, 2024
•
5.52M
•
2.44k
•
196
PrimeIntellect/StackV1-popular
Viewer
•
Updated
Oct 8, 2024
•
93M
•
2.39k
•
2
EleutherAI/reasoning-mix
Viewer
•
Updated
Jan 24
•
11.7M
•
219
•
5
EleutherAI/the_pile_deduplicated
Viewer
•
Updated
Dec 2, 2022
•
134M
•
16.4k
•
106
HIT-TMG/KaLM-embedding-pretrain-data
Viewer
•
Updated
Nov 27
•
23.7M
•
2.07k
•
16
suriyagunasekar/stackoverflow-with-meta-data
Viewer
•
Updated
Feb 23, 2023
•
19.9M
•
3.67k
•
12
vesteinn/babylm
Viewer
•
Updated
Jul 3, 2023
•
13.6M
•
1.03k
•
5
Salesforce/wikitext
Viewer
•
Updated
Jan 4, 2024
•
3.71M
•
866k
•
544
gk4u/reddit_dataset_104
Viewer
•
Updated
Apr 7
•
474M
•
2.55k
•
4
EleutherAI/deep-ignorance-annealing-mix
Viewer
•
Updated
Aug 12
•
89M
•
3.03k
•
1
Locutusque/TM-DATA-V2
Viewer
•
Updated
May 4, 2024
•
10.2M
•
174
•
5
Skywork/SkyPile-150B
Viewer
•
Updated
Dec 7, 2023
•
1.76M
•
18.7k
•
393
HuggingFaceTB/stack-edu
Viewer
•
Updated
Mar 20
•
167M
•
1.98k
•
60
Locutusque/deeplm-training-data
Viewer
•
Updated
Apr 11
•
2.17M
•
103
•
3
nvidia/Llama-Nemotron-Post-Training-Dataset
Viewer
•
Updated
May 8
•
3.91M
•
5.3k
•
623
LLM360/TxT360
Updated
May 26
•
42.1k
•
247
EssentialAI/essential-web-v1.0
Preview
•
Updated
Oct 2
•
7.85k
•
213
Upvote
-
Share collection
View history
Collection guide
Browse collections