MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale Paper • 2604.04771 • Published 4 days ago • 103
MMFineReason Collection High-quality STEM reasoning dataset for Multimodal LLM post-training. • 8 items • Updated 9 days ago • 22
OpenDataArena/MMFineReason-SFT-123K-Qwen3-VL-235B-Thinking Viewer • Updated Feb 3 • 123k • 1.57k • 78
MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods Paper • 2601.21821 • Published Jan 29 • 62
Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility Paper • 2601.17027 • Published Jan 17 • 42
OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value Paper • 2512.14051 • Published Dec 16, 2025 • 47