ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces Paper • 2604.05172 • Published 7 days ago • 22
RoboAlign: Learning Test-Time Reasoning for Language-Action Alignment in Vision-Language-Action Models Paper • 2603.21341 • Published 21 days ago • 23
SpatialBoost: Enhancing Visual Representation through Language-Guided Reasoning Paper • 2603.22057 • Published 20 days ago • 45
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks Paper • 2602.12670 • Published Feb 13 • 59
Physical AI Collection Collection of open, commercial-grade datasets for physical AI developers • 29 items • Updated 2 days ago • 140
Neural MMO 2.0: A Massively Multi-task Addition to Massively Multi-agent Learning Paper • 2311.03736 • Published Nov 7, 2023 • 12