Instructions to use CowCorpus/Cluster0-Collaborative-Llava with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use CowCorpus/Cluster0-Collaborative-Llava with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="CowCorpus/Cluster0-Collaborative-Llava") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("CowCorpus/Cluster0-Collaborative-Llava") model = AutoModelForImageTextToText.from_pretrained("CowCorpus/Cluster0-Collaborative-Llava") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use CowCorpus/Cluster0-Collaborative-Llava with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "CowCorpus/Cluster0-Collaborative-Llava" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CowCorpus/Cluster0-Collaborative-Llava", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/CowCorpus/Cluster0-Collaborative-Llava
- SGLang
How to use CowCorpus/Cluster0-Collaborative-Llava with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "CowCorpus/Cluster0-Collaborative-Llava" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CowCorpus/Cluster0-Collaborative-Llava", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "CowCorpus/Cluster0-Collaborative-Llava" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CowCorpus/Cluster0-Collaborative-Llava", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use CowCorpus/Cluster0-Collaborative-Llava with Docker Model Runner:
docker model run hf.co/CowCorpus/Cluster0-Collaborative-Llava
Model Card for CowCorpus/Cluster0-Collaborative-Llava
This model is a specialized fine-tune of the general CowCorpus-Llava model.
It was specifically further fine-tuned on Cluster 0 - Collaborative User data from the CowCorpus dataset to adapt to the specific intervention preferences and behavioral patterns of this user group.
This model is designed for the task of Human Intervention Prediction in collaborative web navigation. Unlike standard autonomous agents, this model predicts when Collaborative user (Cluster 0) needs to take control from an AI agent. It utilizes multimodal inputs (screenshots, DOM trees, and action history) to distinguish between safe autonomous execution and moments requiring human error correction, preference alignment, or assistance.
Model Details
Model Description
- Developed by: CowCorpus Team (Huq et al.)
- Model type: Multimodal Causal Language Model
- Parent Model: CowCorpus/CowCorpus-llama3-llava-next-8b
- Base model: lmms-lab/llama3-llava-next-8b
- Language: English
- License: Llama 3 Community License Agreement
- Paper: Modeling Distinct Human Interaction in Web Agents
- Repository: GitHub: oaishi/CowCorpus
Input Data
The model is trained on a rich, multimodal state representation:
- Visual Screenshot: The pixel-level view of the current webpage.
- UI Structure (AX Tree): The accessibility tree (textual representation of DOM).
- Past Trajectory: The history of actions taken by the agent/human so far.
- Proposed Next Action: The action that the autonomous agent intends to take. The model evaluates if this intent is erroneous.
How to Get Started
For inference code, prompt templates, and setup instructions, please refer to our GitHub Repository.
Training Data
The model underwent a two-stage training process:
- Stage 1 (General Adaptation): Fine-tuned on the complete CowCorpus dataset.
- Stage 2 (User Personalization): Further fine-tuned on the User Cluster 0 subset of CowCorpus, consists of 101 trajectories and 793 steps.
User Cluster 0 Characteristics:
- Data Source: A subset of the collaborative trajectories specific to User Group 0.
- Behavioral Profile: Collaborative user, interact with rare, modest interventions, usually later in the task, with a strong tendency to hand control back to the agent.
Training Configuration
- Hyperparameters:
- Learning Rate: Linear decay from 1e-5 to ~2e-9
- Epochs: 6
- Global Steps: 120
- Batch Size: 1
- Precision: bfloat16
Evaluation: Cross-Cluster Personalization
We evaluate the model using the Perfect Timing Score (PTS), a metric designed to measure the temporal accuracy of intervention predictions.
Because this is a personalized model, we report Cross-Cluster PTS. This measures how well the model (trained on Cluster 0) performs on its own test data versus test data from other user clusters. High performance on the diagonal (matching train/test groups) indicates successful personalization.
Cross-Cluster PTS Heatmap
The table below displays the PTS values. Rows represent the User Cluster the model was trained on, and Columns represent the User Cluster data it was tested on.
| Trained On (Model) | Tested On: Collaborative (User 0) | Tested On: Hands-on (User 2) | Tested On: Takeover (User 3) |
|---|---|---|---|
| Collaborative | 0.187 | 0.130 | 0.058 |
| Hands-on | 0.417 | 0.583 | 0.468 |
| Takeover | 0.000 | 0.027 | 0.009 |
Note: All models are evaluated in a zero-shot setting without reasoning.
Citation [optional]
If you use this model or dataset, please cite our work: Paper incoming
- Downloads last month
- 2
Model tree for CowCorpus/Cluster0-Collaborative-Llava
Base model
lmms-lab/llama3-llava-next-8b