None defined yet.
AJ-Bench: Benchmarking Agent-as-a-Judge for Environment-Aware Evaluation
LARY: A Latent Action Representation Yielding Benchmark for Generalizable Vision-to-Action Alignment