RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation Paper • 2509.16198 • Published Sep 19 • 125
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks? Paper • 2509.16941 • Published Sep 21 • 20
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models Paper • 2303.08896 • Published Mar 15, 2023 • 4
From Local to Global: A Graph RAG Approach to Query-Focused Summarization Paper • 2404.16130 • Published Apr 24, 2024 • 6
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models Paper • 2405.14831 • Published May 23, 2024 • 5
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use Paper • 2509.24002 • Published 25 days ago • 165
Imperceptible Jailbreaking against Large Language Models Paper • 2510.05025 • Published 17 days ago • 33
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published 17 days ago • 432