Submitted by HelloJiang 165 Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention · 15 authors 10
Submitted by akhaliq 46 SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? · 4 authors 5
Submitted by RunpeiDong 43 Learning Getting-Up Policies for Real-World Humanoid Robots · 4 authors 151 3
Submitted by Mifucius 37 I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models · 8 authors 178 3
Submitted by Ningyu 23 How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training · 8 authors 6
Submitted by nielsr 20 Intuitive physics understanding emerges from self-supervised pretraining on natural videos · 8 authors 172 2
Submitted by zhihz0535 20 IHEval: Evaluating Language Models on Following the Instruction Hierarchy · 14 authors 2
Submitted by dreamerdeo 18 Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs · 41 authors 66 4
Submitted by comin 17 HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation · 7 authors 63 2
Submitted by aboots 17 Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation · 8 authors 290 2
Submitted by comin 16 Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening · 6 authors 3
Submitted by Minbyul 15 System Message Generation for User Preferences using Open-Source Models · 5 authors 2
Submitted by akhaliq 13 Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems · 5 authors 56 2
Submitted by vardaan123 10 Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents · 8 authors 2
Submitted by WenDingY 10 The Mirage of Model Editing: Revisiting Evaluation in the Wild · 8 authors 2
Submitted by Bohan22 10 SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors · 3 authors 2
Submitted by akhaliq 9 video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model · 8 authors 2
Submitted by ingeol 8 SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL · 4 authors 2
Submitted by ChengyouJia 7 PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning · 9 authors 2
Submitted by gkakogeorgiou 7 EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling · 4 authors 2
Submitted by akhaliq 7 One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs · 13 authors 1 2
Submitted by shizhuo2 6 Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarsity · 3 authors 2
Submitted by KomeijiForce 6 Cuckoo: An IE Free Rider Hatched by Massive Nutrition in LLM's Nest · 4 authors 7 2
Submitted by avanturist 6 Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning · 4 authors 23 2
Submitted by emrecanacikgoz 5 Can a Single Model Master Both Multi-turn Conversations and Tool Use? CALM: A Unified Conversational Agentic Language Model · 9 authors 2
Submitted by gretawarren 4 Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking · 3 authors 2
Submitted by hammh0a 3 Towards Data-Efficient Pretraining for Atomic Property Prediction · 3 authors 3
Submitted by ishikaa 1 Data Valuation using Neural Networks for Efficient Instruction Fine-Tuning · 2 authors 2
Submitted by birgermoell - Language Complexity Measurement as a Noisy Zero-Shot Proxy for Evaluating LLM Performance · 2 authors 2
Submitted by ryuryukke - ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability · 5 authors 2