bluelightai-dev/clt-eval-modernbert-tokenized
Viewer
• Updated
• 219k • 3
bluelightai-dev/clt-train-modernbert-tokenized
Viewer
• Updated
• 1.94M • 21
bluelightai-dev/clt-pretrain-data-v3-eval-tokenized-Qwen3-256
Viewer
• Updated
• 212k • 52
bluelightai-dev/clt-pretrain-data-v3-tokenized-Qwen3-max-1024
Viewer
• Updated
• 4.04M • 23
bluelightai-dev/clt-pretrain-data-v3-tokenized-qwen3
Viewer
• Updated
• 1.81M • 291
bluelightai-dev/clt-pretrain-data-v3
Viewer
• Updated
• 2.99M • 30
bluelightai-dev/dolma3_dolmino_mix-100B-1125-sample
Viewer
• Updated
• 6.32M • 21
bluelightai-dev/dolma3_mix-150B-1025-sample
Viewer
• Updated
• 4.97M • 50
bluelightai-dev/clt-mixed-eval-data-tokenized-Qwen3
Viewer
• Updated
• 115k • 27
bluelightai-dev/clt-mixed-eval-data
Viewer
• Updated
• 60k • 23
bluelightai-dev/clt-mixed-data-tokenized-Qwen3
Viewer
• Updated
• 2.6M • 42
bluelightai-dev/clt-pretrain-eval-data-tokenized-Qwen3-256
Viewer
• Updated
• 194k • 40
bluelightai-dev/clt-pretrain-data-dedup-tokenized-Qwen3-1024
Viewer
• Updated
• 2.52M • 55
bluelightai-dev/clt-pretrain-data-v2-dedup
Preview
• Updated
• 19
bluelightai-dev/clt-pretrain-data-tokenized-Qwen3-1024
Viewer
• Updated
• 2.44M • 63
bluelightai-dev/clt-pretrain-data-v2
Preview
• Updated
• 67
bluelightai-dev/MathPile_Commercial-formatted
Viewer
• Updated
• 389k • 58
bluelightai-dev/clt_posttrain_data_tokenized
Viewer
• Updated
• 1.34M • 56
bluelightai-dev/common-corpus-sample-open-web
Viewer
• Updated
• 4.8M • 38
bluelightai-dev/common-corpus-sample-open-source
Viewer
• Updated
• 2.02M • 32
bluelightai-dev/common-corpus-sample-open-science
Viewer
• Updated
• 284k • 36
bluelightai-dev/common-corpus-sample-open-government
Viewer
• Updated
• 373k • 35
• 1
bluelightai-dev/common-corpus-sample-open-culture
Viewer
• Updated
• 462k • 55
bluelightai-dev/clt_posttrain_data_tokenized_test_1000
Viewer
• Updated
• 1.22k • 10
bluelightai-dev/dclm-full-deduped-sample
Viewer
• Updated
• 4.92M • 54
bluelightai-dev/the-stack-dedup-sample
Viewer
• Updated
• 474k • 29
bluelightai-dev/pythia_clt_pretrain_data_tokenized
Viewer
• Updated
• 3.5M • 70
bluelightai-dev/clt_eval_data_qwen3_tokenized_256
Viewer
• Updated
• 245k • 84
bluelightai-dev/clt_pretrain_data_qwen_tokenized
Viewer
• Updated
• 16.7M • 132
bluelightai-dev/clt_posttrain_data_qwen_tokenized
Viewer
• Updated
• 1.34M • 86