Instructions to use CowCorpus/Cluster0-Collaborative-Llava with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use CowCorpus/Cluster0-Collaborative-Llava with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="CowCorpus/Cluster0-Collaborative-Llava")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("CowCorpus/Cluster0-Collaborative-Llava")
model = AutoModelForImageTextToText.from_pretrained("CowCorpus/Cluster0-Collaborative-Llava")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use CowCorpus/Cluster0-Collaborative-Llava with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "CowCorpus/Cluster0-Collaborative-Llava"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CowCorpus/Cluster0-Collaborative-Llava",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/CowCorpus/Cluster0-Collaborative-Llava

SGLang

How to use CowCorpus/Cluster0-Collaborative-Llava with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "CowCorpus/Cluster0-Collaborative-Llava" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CowCorpus/Cluster0-Collaborative-Llava",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "CowCorpus/Cluster0-Collaborative-Llava" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CowCorpus/Cluster0-Collaborative-Llava",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use CowCorpus/Cluster0-Collaborative-Llava with Docker Model Runner:
```
docker model run hf.co/CowCorpus/Cluster0-Collaborative-Llava
```

Model Card for CowCorpus/Cluster0-Collaborative-Llava

This model is a specialized fine-tune of the general CowCorpus-Llava model.

It was specifically further fine-tuned on Cluster 0 - Collaborative User data from the CowCorpus dataset to adapt to the specific intervention preferences and behavioral patterns of this user group.

This model is designed for the task of Human Intervention Prediction in collaborative web navigation. Unlike standard autonomous agents, this model predicts when Collaborative user (Cluster 0) needs to take control from an AI agent. It utilizes multimodal inputs (screenshots, DOM trees, and action history) to distinguish between safe autonomous execution and moments requiring human error correction, preference alignment, or assistance.

Model Details

Model Description

Developed by: CowCorpus Team (Huq et al.)
Model type: Multimodal Causal Language Model
Parent Model: CowCorpus/CowCorpus-llama3-llava-next-8b
Base model: lmms-lab/llama3-llava-next-8b
Language: English
License: Llama 3 Community License Agreement
Paper: Modeling Distinct Human Interaction in Web Agents
Repository: GitHub: oaishi/CowCorpus

Input Data

The model is trained on a rich, multimodal state representation:

Visual Screenshot: The pixel-level view of the current webpage.
UI Structure (AX Tree): The accessibility tree (textual representation of DOM).
Past Trajectory: The history of actions taken by the agent/human so far.
Proposed Next Action: The action that the autonomous agent intends to take. The model evaluates if this intent is erroneous.

How to Get Started

For inference code, prompt templates, and setup instructions, please refer to our GitHub Repository.

Training Data

The model underwent a two-stage training process:

Stage 1 (General Adaptation): Fine-tuned on the complete CowCorpus dataset.
Stage 2 (User Personalization): Further fine-tuned on the User Cluster 0 subset of CowCorpus, consists of 101 trajectories and 793 steps.

User Cluster 0 Characteristics:

Data Source: A subset of the collaborative trajectories specific to User Group 0.
Behavioral Profile: Collaborative user, interact with rare, modest interventions, usually later in the task, with a strong tendency to hand control back to the agent.

Training Configuration

Hyperparameters:
- Learning Rate: Linear decay from 1e-5 to ~2e-9
- Epochs: 6
- Global Steps: 120
- Batch Size: 1
- Precision: bfloat16

Evaluation: Cross-Cluster Personalization

We evaluate the model using the Perfect Timing Score (PTS), a metric designed to measure the temporal accuracy of intervention predictions.

Because this is a personalized model, we report Cross-Cluster PTS. This measures how well the model (trained on Cluster 0) performs on its own test data versus test data from other user clusters. High performance on the diagonal (matching train/test groups) indicates successful personalization.

Cross-Cluster PTS Heatmap

The table below displays the PTS values. Rows represent the User Cluster the model was trained on, and Columns represent the User Cluster data it was tested on.

Trained On (Model)	Tested On: Collaborative (User 0)	Tested On: Hands-on (User 2)	Tested On: Takeover (User 3)
Collaborative	0.187	0.130	0.058
Hands-on	0.417	0.583	0.468
Takeover	0.000	0.027	0.009

Note: All models are evaluated in a zero-shot setting without reasoning.

Citation [optional]

If you use this model or dataset, please cite our work: Paper incoming

Downloads last month: 2

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for CowCorpus/Cluster0-Collaborative-Llava

Base model

lmms-lab/llama3-llava-next-8b

Finetuned

CowCorpus/CowCorpus-llama3-llava-next-8b

Finetuned

(3)

this model