This is the AWQ model of UI-TARS-1.5-7B, built using AutoAWQ on A100(80G), works with vllm and lmdeploy.

Model Description

UI-TARS-1.5-7B is an open-source multimodal agent model released by ByteDance. It achieves state-of-the-art results across a variety of standard benchmarks, demonstrating strong reasoning capabilities and notable improvements over prior models.

Code: https://github.com/bytedance/UI-TARS

Application: https://github.com/bytedance/UI-TARS-desktop

Grounding Capability Evaluation

Benchmark	UI-TARS-1.5	OpenAI CUA	Claude 3.7	Previous SOTA
ScreensSpot-V2	94.2	87.9	87.6	91.6
ScreenSpotPro	61.6	23.4	27.7	43.6

Model Scale Comparison

This table compares performance across different model scales of UI-TARS on the OSworld benchmark.

Benchmark Type	Benchmark	UI-TARS-72B-DPO	UI-TARS-1.5-7B	UI-TARS-1.5
Computer Use	OSWorld	24.6	27.5	42.5
GUI Grounding	ScreenSpotPro	38.1	49.6	61.6

The released UI-TARS-1.5-7B focuses primarily on enhancing general computer use capabilities and is not specifically optimized for game-based scenarios, where the UI-TARS-1.5 still holds a significant advantage.