This is the AWQ model of UI-TARS-1.5-7B, built using AutoAWQ on A100(80G), works with vllm and lmdeploy.

Model Description

UI-TARS-1.5-7B is an open-source multimodal agent model released by ByteDance. It achieves state-of-the-art results across a variety of standard benchmarks, demonstrating strong reasoning capabilities and notable improvements over prior models.

Code: https://github.com/bytedance/UI-TARS

Application: https://github.com/bytedance/UI-TARS-desktop

Grounding Capability Evaluation

Benchmark UI-TARS-1.5 OpenAI CUA Claude 3.7 Previous SOTA
ScreensSpot-V2 94.2 87.9 87.6 91.6
ScreenSpotPro 61.6 23.4 27.7 43.6

Model Scale Comparison

This table compares performance across different model scales of UI-TARS on the OSworld benchmark.

Benchmark Type Benchmark UI-TARS-72B-DPO UI-TARS-1.5-7B UI-TARS-1.5
Computer Use OSWorld 24.6 27.5 42.5
GUI Grounding ScreenSpotPro 38.1 49.6 61.6

The released UI-TARS-1.5-7B focuses primarily on enhancing general computer use capabilities and is not specifically optimized for game-based scenarios, where the UI-TARS-1.5 still holds a significant advantage.

Downloads last month
1,305
Safetensors
Model size
8B params
Tensor type
I32
·
BF16
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for flin775/UI-TARS-1.5-7B-AWQ

Quantized
(12)
this model

Papers for flin775/UI-TARS-1.5-7B-AWQ