view article Article The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix 28 days ago • 46
view article Article We’re open-sourcing our text-to-image model and the process behind it 19 days ago • 71
view article Article FineWeb-C: A Community-Driven Dataset for Educational Quality Annotations in 122 Languages Jul 8 • 32
Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation Paper • 2507.02608 • Published Jul 3 • 21