prithivMLmods commited on
Commit
ff780c4
·
verified ·
1 Parent(s): 4357a7c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +92 -2
README.md CHANGED
@@ -7,7 +7,18 @@ base_model:
7
  ---
8
 
9
 
10
- # agent-browse / calendars / human-browse
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  ---
13
 
@@ -28,6 +39,76 @@ weighted avg 0.9263 0.9219 0.9224 1639
28
 
29
  ---
30
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
  ## ID2Label Testing
32
 
33
  ```py
@@ -55,4 +136,13 @@ print(id2label)
55
  {'0': 'agentbrowse', '1': 'calendars', '2': 'humanbrowse'}
56
  ```
57
 
58
- ---
 
 
 
 
 
 
 
 
 
 
7
  ---
8
 
9
 
10
+ # **WebClick-AgentBrowse-SigLIP2**
11
+
12
+ > **WebClick-AgentBrowse-SigLIP2** is a vision-language encoder model fine-tuned from [`google/siglip2-base-patch16-512`](https://huggingface.co/google/siglip2-base-patch16-512) for **multi-class image classification**.
13
+ It is trained to detect and classify web UI click regions into three classes: `agentbrowse`, `calendars`, and `humanbrowse`. The model utilizes the `SiglipForImageClassification` architecture.
14
+
15
+ > \[!note]
16
+ > **SigLIP 2**: *Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features*
17
+ > [https://arxiv.org/pdf/2502.14786](https://arxiv.org/pdf/2502.14786)
18
+
19
+
20
+ > [!note]
21
+ agent-browse / calendars / human-browse
22
 
23
  ---
24
 
 
39
 
40
  ---
41
 
42
+ ## Label Space: 3 Classes
43
+
44
+ ```
45
+
46
+ Class 0: agentbrowse
47
+ Class 1: calendars
48
+ Class 2: humanbrowse
49
+
50
+ ````
51
+
52
+ ---
53
+
54
+ ## Install Dependencies
55
+
56
+ ```bash
57
+ pip install -q transformers torch pillow gradio hf_xet
58
+ ````
59
+
60
+ ---
61
+
62
+ ## Inference Code
63
+
64
+ ```python
65
+ import gradio as gr
66
+ from transformers import AutoImageProcessor, SiglipForImageClassification
67
+ from PIL import Image
68
+ import torch
69
+
70
+ # Load model and processor
71
+ model_name = "prithivMLmods/webclick-agentbrowse-siglip2" # Replace with actual HF model repo
72
+ model = SiglipForImageClassification.from_pretrained(model_name)
73
+ processor = AutoImageProcessor.from_pretrained(model_name)
74
+
75
+ # Updated label mapping
76
+ id2label = {
77
+ "0": "agentbrowse",
78
+ "1": "calendars",
79
+ "2": "humanbrowse"
80
+ }
81
+
82
+ def classify_image(image):
83
+ image = Image.fromarray(image).convert("RGB")
84
+ inputs = processor(images=image, return_tensors="pt")
85
+
86
+ with torch.no_grad():
87
+ outputs = model(**inputs)
88
+ logits = outputs.logits
89
+ probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
90
+
91
+ prediction = {
92
+ id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
93
+ }
94
+
95
+ return prediction
96
+
97
+ # Gradio Interface
98
+ iface = gr.Interface(
99
+ fn=classify_image,
100
+ inputs=gr.Image(type="numpy"),
101
+ outputs=gr.Label(num_top_classes=3, label="Click Type Classification"),
102
+ title="WebClick AgentBrowse Classifier",
103
+ description="Upload a web UI screenshot to classify regions: agentbrowse, calendars, or humanbrowse."
104
+ )
105
+
106
+ if __name__ == "__main__":
107
+ iface.launch()
108
+ ```
109
+
110
+ ---
111
+
112
  ## ID2Label Testing
113
 
114
  ```py
 
136
  {'0': 'agentbrowse', '1': 'calendars', '2': 'humanbrowse'}
137
  ```
138
 
139
+ ---
140
+
141
+ ## Intended Use
142
+
143
+ **WebClick-AgentBrowse-SigLIP2** is intended for:
144
+
145
+ * **UI Understanding** – Classify user interaction zones in web interface screenshots.
146
+ * **Multimodal Agents** – Enhance visual perception for agent planning or RPA systems.
147
+ * **Interface Automation** – Facilitate click zone detection for automated agents.
148
+ * **Web Analytics** – Analyze user behavior patterns based on layout interaction predictions.