Cluster0-Collaborative-Llava / README.md

Update README.md

e00ba37 verified 1 day ago

4.98 kB

	---
	license: llama3
	language:
	- en
	base_model:
	- lmms-lab/llama3-llava-next-8b
	- CowCorpus/CowCorpus-llama3-llava-next-8b
	pipeline_tag: text-generation
	tags:
	- text-generation
	- agent
	- cowcorpus
	- llava
	- personalization
	- user-adaptation
	metrics:
	- accuracy
	- f1
	- perfect-timing-score
	library_name: transformers
	---

	# Model Card for CowCorpus/Cluster0-Collaborative-Llava

	<!-- Provide a quick summary of what the model is/does. -->
	This model is a specialized fine-tune of the general [CowCorpus-Llava](https://huggingface.co/CowCorpus/CowCorpus-llama3-llava-next-8b) model.

	It was specifically further fine-tuned on Cluster 0 - Collaborative User data from the CowCorpus dataset to adapt to the specific intervention preferences and behavioral patterns of this user group.

	This model is designed for the task of Human Intervention Prediction in collaborative web navigation. Unlike standard autonomous agents,
	this model predicts when Collaborative user (Cluster 0) needs to take control from an AI agent. It utilizes multimodal inputs (screenshots, DOM trees, and action history)
	to distinguish between safe autonomous execution and moments requiring human error correction, preference alignment, or assistance.

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->
	- Developed by: CowCorpus Team (Huq et al.)
	- Model type: Multimodal Causal Language Model
	- Parent Model: [CowCorpus/CowCorpus-llama3-llava-next-8b](https://huggingface.co/CowCorpus/CowCorpus-llama3-llava-next-8b)
	- Base model: [lmms-lab/llama3-llava-next-8b](https://huggingface.co/lmms-lab/llama3-llava-next-8b)
	- Language: English
	- License: [Llama 3 Community License Agreement](https://www.llama.com/llama3/license/)
	- Paper: Modeling Distinct Human Interaction in Web Agents
	- Repository: [GitHub: oaishi/CowCorpus](https://github.com/oaishi/CowCorpus)

	### Input Data
	The model is trained on a rich, multimodal state representation:
	1. Visual Screenshot: The pixel-level view of the current webpage.
	2. UI Structure (AX Tree): The accessibility tree (textual representation of DOM).
	3. Past Trajectory: The history of actions taken by the agent/human so far.
	4. Proposed Next Action: The action that the autonomous agent intends to take. The model evaluates if this intent is erroneous.

	## How to Get Started

	For inference code, prompt templates, and setup instructions, please refer to our [GitHub Repository](https://github.com/oaishi/CowCorpus).

	### Training Data

	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
	The model underwent a two-stage training process:
	1. Stage 1 (General Adaptation): Fine-tuned on the complete CowCorpus dataset.
	2. Stage 2 (User Personalization): Further fine-tuned on the User Cluster 0 subset of CowCorpus, consists of 101 trajectories and 793 steps.

	User Cluster 0 Characteristics:
	* Data Source: A subset of the collaborative trajectories specific to User Group 0.
	* Behavioral Profile: Collaborative user, interact with rare, modest interventions, usually later in the task, with a strong tendency to hand control back to the agent.

	### Training Configuration

	<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
	- Hyperparameters:
	- Learning Rate: Linear decay from 1e-5 to ~2e-9
	- Epochs: 6
	- Global Steps: 120
	- Batch Size: 1
	- Precision: bfloat16

	## Evaluation: Cross-Cluster Personalization

	We evaluate the model using the Perfect Timing Score (PTS), a metric designed to measure the temporal accuracy of intervention predictions.

	Because this is a personalized model, we report Cross-Cluster PTS. This measures how well the model (trained on Cluster 0) performs on its own test data versus test data from other user clusters.
	High performance on the diagonal (matching train/test groups) indicates successful personalization.

	### Cross-Cluster PTS Heatmap

	The table below displays the PTS values. Rows represent the User Cluster the model was trained on, and Columns represent the User Cluster data it was tested on.

	\| Trained On (Model) \| Tested On: Collaborative (User 0) \| Tested On: Hands-on (User 2) \| Tested On: Takeover (User 3) \|
	\| :--- \| :---: \| :---: \| :---: \|
	\| Collaborative \| 0.187 \| 0.130 \| 0.058 \|
	\| Hands-on \| 0.417 \| 0.583 \| 0.468 \|
	\| Takeover \| 0.000 \| 0.027 \| 0.009 \|

	Note: All models are evaluated in a zero-shot setting without reasoning.

	## Citation [optional]

	<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
	If you use this model or dataset, please cite our work: Paper incoming