common-pile/arxiv_abstracts_filtered
Viewer
•
Updated
•
2.5M
•
77
•
1
An LLM pre-training dataset produced by filtering and deduplicating the raw text collected in the Common Pile v0.1