MPDiT: Multi-Patch Global-to-Local Transformer Architecture For Efficient Flow Matching and Diffusion Model Paper • 2603.26357 • Published 9 days ago • 4
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published 4 days ago • 22
OptiMer: Optimal Distribution Vector Merging Is Better than Data Mixing for Continual Pre-Training Paper • 2603.28858 • Published 6 days ago • 8
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published 16 days ago • 315