CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published 1 day ago • 73
EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings Paper • 2603.13594 • Published 13 days ago • 143
view article Article AprielGuard: A Guardrail for Safety and Adversarial Robustness in Modern LLM Systems Dec 23, 2025 • 48
GRAFT: GRaPH and Table Reasoning for Textual Alignment -- A Benchmark for Structured Instruction Following and Visual Reasoning Paper • 2508.15690 • Published Aug 21, 2025 • 8