Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization Paper • 2601.21358 • Published 6 days ago • 6
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models Paper • 2602.02185 • Published 2 days ago • 118
RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System Paper • 2602.02488 • Published 2 days ago • 27
TTCS: Test-Time Curriculum Synthesis for Self-Evolving Paper • 2601.22628 • Published 6 days ago • 32
PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss Paper • 2602.02493 • Published 2 days ago • 34
FSVideo: Fast Speed Video Diffusion Model in a Highly-Compressed Latent Space Paper • 2602.02092 • Published 2 days ago • 17
FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents Paper • 2602.01566 • Published 3 days ago • 43
ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought Paper • 2601.23184 • Published 5 days ago • 28
DINO-SAE: DINO Spherical Autoencoder for High-Fidelity Image Reconstruction and Generation Paper • 2601.22904 • Published 5 days ago • 13
ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation Paper • 2601.21420 • Published 6 days ago • 41
Generation Enhances Understanding in Unified Multimodal Models via Multi-Representation Generation Paper • 2601.21406 • Published 6 days ago • 4
Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models Paper • 2601.20354 • Published 7 days ago • 109
DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation Paper • 2601.22153 • Published 6 days ago • 68
Less is More: Optimizing Function Calling for LLM Execution on Edge Devices Paper • 2411.15399 • Published Nov 23, 2024 • 1
view article Article Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective 9 days ago • 46
iFSQ: Improving FSQ for Image Generation with 1 Line of Code Paper • 2601.17124 • Published 12 days ago • 31