An LLM pre-training dataset containing only public domain and openly licensed text
Nikhil Kandpal
nkandpa2
AI & ML interests
None yet
Recent Activity
upvoted
an
article
about 23 hours ago
Announcing the Common Pile and Comma v0.1
updated
a dataset
5 days ago
common-pile/stackv2_edu_filtered
updated
a dataset
5 days ago
common-pile/youtube_filtered
Organizations
Collections
1
Papers
1
models
7
nkandpa2/comma-v0.1-checkpoints
Updated
•
150
nkandpa2/comma-v0.1-stage2
Updated
•
4
nkandpa2/comma-v0.1-stage1
Updated
•
4
nkandpa2/comma-v0.1-checkpoint-hf
Updated
•
11
nkandpa2/comma-v0.1-ablation-hf
Updated
•
4
nkandpa2/comma-loss-test
Text Generation
•
Updated
•
6
nkandpa2/Llama_3.2_1B__alpaca_finetune
Updated
•
4
datasets
45
nkandpa2/code_dates_sorted
Viewer
•
Updated
•
218M
•
93
nkandpa2/oer_dates_sorted
Viewer
•
Updated
•
646k
•
80
nkandpa2/audio_dates_sorted
Viewer
•
Updated
•
1.13M
•
83
nkandpa2/forum_dates_sorted
Viewer
•
Updated
•
64.7M
•
86
nkandpa2/webtext_dates_sorted
Viewer
•
Updated
•
51.2M
•
89
nkandpa2/wiki_dates_sorted
Viewer
•
Updated
•
283M
•
111
nkandpa2/gov_dates_sorted
Viewer
•
Updated
•
19.6M
•
88
nkandpa2/scientific_papers_dates_sorted
Viewer
•
Updated
•
13M
•
87
nkandpa2/all_dates_sorted
Viewer
•
Updated
•
652M
•
105
nkandpa2/all_dates
Viewer
•
Updated
•
652M
•
113