CU-Benchmarks
updated
visualwebbench/VisualWebBench
Viewer
•
Updated
•
1.54k
•
842
•
18
Updated
•
82
•
6
rootsautomation/RICO-ScreenQA
Viewer
•
Updated
•
86k
•
344
•
10
rootsautomation/ScreenSpot
Viewer
•
Updated
•
1.27k
•
1.78k
•
43
Viewer
•
Updated
•
1.27k
•
393
•
7
Viewer
•
Updated
•
1.59k
•
3.41k
•
42
Preview
•
Updated
•
1.7k
•
15
Preview
•
Updated
•
2.1k
•
25
Viewer
•
Updated
•
168k
•
355
•
5
Preview
•
Updated
•
8
osunlp/Multimodal-Mind2Web
Viewer
•
Updated
•
14.2k
•
3.32k
•
88
Viewer
•
Updated
•
259
•
73
•
2
Viewer
•
Updated
•
253
•
1.59k
•
114
Viewer
•
Updated
•
7.74k
•
7.53k
•
26
xlangai/ubuntu_osworld_file_cache
Updated
•
347k
•
2
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Paper
•
2409.08264
•
Published
•
48
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Paper
•
2405.14573
•
Published
Viewer
•
Updated
•
1.21k
•
95
•
5