MUR: Momentum Uncertainty guided Reasoning for Large Language Models Paper β’ 2507.14958 β’ Published 14 days ago β’ 45
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning Paper β’ 2506.01713 β’ Published Jun 2 β’ 47
A Controllable Examination for Long-Context Language Models Paper β’ 2506.02921 β’ Published Jun 3 β’ 33
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents Paper β’ 2506.03143 β’ Published Jun 3 β’ 50
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows Paper β’ 2505.19897 β’ Published May 26 β’ 102
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning Paper β’ 2504.08672 β’ Published Apr 11 β’ 55
Breaking the Data Barrier -- Building GUI Agents Through Task Generalization Paper β’ 2504.10127 β’ Published Apr 14 β’ 17
FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning Paper β’ 2504.00487 β’ Published Apr 1 β’ 18
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning Paper β’ 2503.21620 β’ Published Mar 27 β’ 63
MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving Paper β’ 2503.16905 β’ Published Mar 21 β’ 55
MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization Paper β’ 2503.16874 β’ Published Mar 21 β’ 45
Ο-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation Paper β’ 2503.13288 β’ Published Mar 17 β’ 52
GKG-LLM: A Unified Framework for Generalized Knowledge Graph Construction Paper β’ 2503.11227 β’ Published Mar 14 β’ 24
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era Paper β’ 2503.12329 β’ Published Mar 16 β’ 26
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models Paper β’ 2502.07346 β’ Published Feb 11 β’ 54
Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond Paper β’ 2306.09841 β’ Published Jun 16, 2023 β’ 3
Self-supervised Quantized Representation for Seamlessly Integrating Knowledge Graphs with Large Language Models Paper β’ 2501.18119 β’ Published Jan 30 β’ 25
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper β’ 2412.19723 β’ Published Dec 27, 2024 β’ 88
ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting Paper β’ 2411.17176 β’ Published Nov 26, 2024 β’ 24