Image-Text-to-Text
Transformers
Safetensors
English
Chinese
text-generation
GUI
GUI-Grounding
Vision-language
multimodal
conversational
custom_code
Instructions to use tencent/POINTS-GUI-G with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use tencent/POINTS-GUI-G with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="tencent/POINTS-GUI-G", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("tencent/POINTS-GUI-G", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use tencent/POINTS-GUI-G with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "tencent/POINTS-GUI-G" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tencent/POINTS-GUI-G", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/tencent/POINTS-GUI-G
- SGLang
How to use tencent/POINTS-GUI-G with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "tencent/POINTS-GUI-G" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tencent/POINTS-GUI-G", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "tencent/POINTS-GUI-G" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tencent/POINTS-GUI-G", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use tencent/POINTS-GUI-G with Docker Model Runner:
docker model run hf.co/tencent/POINTS-GUI-G
| from typing import Any | |
| from transformers import PretrainedConfig, Qwen3Config | |
| try: | |
| from transformers.models.qwen2_vl.configuration_qwen2_vl import Qwen2VLVisionConfig | |
| except ImportError: | |
| print('Please upgrade transformers to version 4.46.3 or higher') | |
| class POINTSGUIConfig(PretrainedConfig): | |
| model_type = "points_gui" | |
| is_composition = True | |
| """Configuration class for `POINTSGUI`.""" | |
| def __init__(self, | |
| **kwargs) -> None: | |
| super().__init__(**kwargs) | |
| if not kwargs: | |
| return | |
| vision_config = kwargs.pop("vision_config", None) | |
| llm_config = kwargs.pop("llm_config", None) | |
| if isinstance(vision_config, dict): | |
| self.vision_config = Qwen2VLVisionConfig(**vision_config) | |
| else: | |
| self.vision_config = vision_config | |
| if isinstance(llm_config, dict): | |
| self.llm_config = Qwen3Config(**llm_config) | |
| else: | |
| self.llm_config = llm_config | |
| self.vocab_size = llm_config["vocab_size"] | |
| self.max_position_embeddings = llm_config["max_position_embeddings"] | |
| self.hidden_size = llm_config["hidden_size"] | |
| self.intermediate_size = llm_config["intermediate_size"] | |
| self.num_hidden_layers = llm_config["num_hidden_layers"] | |
| self.num_attention_heads = llm_config["num_attention_heads"] | |
| self.use_sliding_window = llm_config["use_sliding_window"] | |
| self.sliding_window = llm_config["sliding_window"] # we check `use_sliding_window` in the modeling code | |
| self.max_window_layers = llm_config["max_window_layers"] | |
| # for backward compatibility | |
| if llm_config["num_key_value_heads"] is None: | |
| llm_config["num_key_value_heads"] = llm_config["num_attention_heads"] | |
| self.num_key_value_heads = llm_config["num_key_value_heads"] | |
| self.head_dim = llm_config["head_dim"] | |
| self.hidden_act = llm_config["hidden_act"] | |
| self.initializer_range = llm_config["initializer_range"] | |
| self.rms_norm_eps = llm_config["rms_norm_eps"] | |
| self.use_cache = llm_config["use_cache"] | |
| self.rope_theta = llm_config["rope_theta"] | |
| self.rope_scaling = llm_config["rope_scaling"] | |
| self.attention_bias = llm_config["attention_bias"] | |
| self.attention_dropout = llm_config["attention_dropout"] | |
| # Validate the correctness of rotary position embeddings parameters | |
| # BC: if there is a 'type' field, move it to 'rope_type'. | |
| if self.rope_scaling is not None and "type" in self.rope_scaling: | |
| if self.rope_scaling["type"] == "mrope": | |
| self.rope_scaling["type"] = "default" | |
| self.rope_scaling["rope_type"] = self.rope_scaling["type"] | |
| super().__init__( | |
| tie_word_embeddings=llm_config["tie_word_embeddings"], | |
| **kwargs, | |
| ) |