common-pile 's Collections

Common Pile v0.1 Filtered Data

An LLM pre-training dataset produced by filtering and deduplicating the raw text collected in the Common Pile v0.1