File size: 17,968 Bytes
424b150 4e62a46 3fb712f 5740fec 3fb712f 5740fec 3f8ecb2 3fb712f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 |
---
license: apache-2.0
datasets:
- Bruece/reclip_domainnet_126_clipart
language:
- en
base_model:
- google/siglip2-base-patch16-224
pipeline_tag: image-classification
library_name: transformers
tags:
- Clipart-126-DomainNet
---

# **Clipart-126-DomainNet**
> **Clipart-126-DomainNet** is an image classification vision-language encoder model fine-tuned from **google/siglip2-base-patch16-224** for a single-label classification task. It is designed to classify clipart images into 126 domain categories using the **SiglipForImageClassification** architecture.

*Moment Matching for Multi-Source Domain Adaptation* : https://arxiv.org/pdf/1812.01754
*SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features* https://arxiv.org/pdf/2502.14786
```py
Classification Report:
precision recall f1-score support
aircraft_carrier 0.8667 0.4643 0.6047 56
alarm_clock 0.9706 0.8919 0.9296 74
ant 0.8889 0.8615 0.8750 65
anvil 0.5984 0.6083 0.6033 120
asparagus 0.8158 0.6078 0.6966 51
axe 0.7544 0.5309 0.6232 81
banana 0.7111 0.5517 0.6214 58
basket 0.8571 0.8182 0.8372 66
bathtub 0.7531 0.7821 0.7673 78
bear 0.9118 0.6458 0.7561 48
bee 0.9636 0.9636 0.9636 165
bird 0.8967 0.9529 0.9240 255
blackberry 0.8082 0.8429 0.8252 70
blueberry 0.8661 0.8981 0.8818 108
bottlecap 0.7821 0.8299 0.8053 147
broccoli 0.8947 0.8947 0.8947 95
bus 0.9663 0.9348 0.9503 92
butterfly 0.9333 0.9545 0.9438 132
cactus 0.9677 0.9091 0.9375 99
cake 0.8750 0.8099 0.8412 121
calculator 0.9583 0.5897 0.7302 39
camel 0.9391 0.9310 0.9351 116
camera 0.8846 0.8679 0.8762 53
candle 0.8298 0.8478 0.8387 92
cannon 0.8551 0.8551 0.8551 69
canoe 0.8462 0.7432 0.7914 74
carrot 0.8800 0.7719 0.8224 57
castle 1.0000 0.8511 0.9195 47
cat 0.8167 0.7903 0.8033 62
ceiling_fan 1.0000 0.2000 0.3333 30
cell_phone 0.7400 0.6491 0.6916 57
cello 0.8372 0.9114 0.8727 79
chair 0.8986 0.8378 0.8671 74
chandelier 0.9617 0.9263 0.9437 190
coffee_cup 0.8811 0.9389 0.9091 229
compass 0.9799 0.9012 0.9389 162
computer 0.7124 0.9045 0.7970 178
cow 0.9517 0.9718 0.9617 142
crab 0.8738 0.9000 0.8867 100
crocodile 0.9778 0.9167 0.9462 144
cruise_ship 0.8544 0.9072 0.8800 194
dog 0.8125 0.7761 0.7939 67
dolphin 0.7680 0.7500 0.7589 128
dragon 0.9512 0.9176 0.9341 85
drums 0.8919 0.9635 0.9263 137
duck 0.8774 0.8447 0.8608 161
dumbbell 0.9048 0.9500 0.9268 280
elephant 0.9038 0.8952 0.8995 105
eyeglasses 0.8636 0.8488 0.8562 291
feather 0.8564 0.9227 0.8883 181
fence 0.9211 0.8400 0.8787 125
fish 0.8963 0.8768 0.8864 138
flamingo 0.9636 0.9381 0.9507 226
flower 0.9146 0.9454 0.9298 238
foot 0.8780 0.8889 0.8834 81
fork 0.9032 0.9091 0.9061 154
frog 0.9420 0.9489 0.9455 137
giraffe 0.9643 0.9153 0.9391 118
goatee 0.8763 0.9422 0.9081 173
grapes 0.9114 0.8571 0.8834 84
guitar 0.9595 0.8554 0.9045 83
hammer 0.6111 0.7719 0.6822 114
helicopter 0.9444 0.9533 0.9488 107
helmet 0.7368 0.8550 0.7915 131
horse 0.9588 0.9819 0.9702 166
kangaroo 0.9125 0.8488 0.8795 86
lantern 0.8254 0.7536 0.7879 69
laptop 0.8108 0.5000 0.6186 60
leaf 0.7143 0.3333 0.4545 30
lion 0.9744 0.8085 0.8837 47
lipstick 0.7875 0.6632 0.7200 95
lobster 0.8963 0.9130 0.9046 161
microphone 0.7925 0.9231 0.8528 91
monkey 0.9623 0.9027 0.9315 113
mosquito 0.8636 0.8444 0.8539 45
mouse 0.9167 0.8333 0.8730 66
mug 0.8989 0.8163 0.8556 98
mushroom 0.9429 0.9429 0.9429 105
onion 0.9365 0.8429 0.8872 140
panda 1.0000 0.9726 0.9861 73
peanut 0.5900 0.7195 0.6484 82
pear 0.7692 0.7246 0.7463 69
peas 0.8000 0.7429 0.7704 70
pencil 0.6667 0.0909 0.1600 44
penguin 0.9717 0.9279 0.9493 111
pig 0.9551 0.8252 0.8854 103
pillow 0.6290 0.5571 0.5909 70
pineapple 0.9846 0.8889 0.9343 72
potato 0.6038 0.6531 0.6275 98
power_outlet 0.8636 0.4043 0.5507 47
purse 0.0000 0.0000 0.0000 27
rabbit 0.9341 0.8586 0.8947 99
raccoon 0.8836 0.9021 0.8927 143
rhinoceros 0.8750 0.9459 0.9091 74
rifle 0.7595 0.7500 0.7547 80
saxophone 0.9454 0.9886 0.9665 175
screwdriver 0.7521 0.6929 0.7213 127
sea_turtle 0.9677 0.9626 0.9651 187
see_saw 0.6679 0.8698 0.7556 215
sheep 0.9355 0.9158 0.9255 95
shoe 0.8969 0.8700 0.8832 100
skateboard 0.8632 0.8673 0.8652 211
snake 0.9302 0.9160 0.9231 131
speedboat 0.8187 0.8976 0.8563 166
spider 0.9043 0.9286 0.9163 112
squirrel 0.7945 0.8855 0.8375 131
strawberry 0.8687 0.9923 0.9264 260
streetlight 0.8178 0.9293 0.8700 198
string_bean 0.8525 0.8000 0.8254 65
submarine 0.8022 0.8902 0.8439 164
swan 0.8397 0.9003 0.8690 291
table 0.8564 0.9200 0.8871 175
teapot 0.8763 0.9189 0.8971 185
teddy-bear 0.9006 0.8953 0.8980 172
television 0.8509 0.8220 0.8362 118
the_Eiffel_Tower 0.9468 0.9082 0.9271 98
the_Great_Wall_of_China 0.9462 0.9462 0.9462 93
tiger 0.9417 0.9826 0.9617 230
toe 0.8250 0.6600 0.7333 50
train 0.9362 0.9778 0.9565 90
truck 0.9367 0.8916 0.9136 83
umbrella 0.9633 0.9545 0.9589 110
vase 0.7642 0.8393 0.8000 112
watermelon 0.9527 0.9527 0.9527 148
whale 0.7453 0.8144 0.7783 194
zebra 0.9275 0.9676 0.9471 185
accuracy 0.8691 14818
macro avg 0.8613 0.8251 0.8351 14818
weighted avg 0.8705 0.8691 0.8661 14818
```
The model categorizes images into the following 126 classes:
- **Class 0:** "aircraft_carrier"
- **Class 1:** "alarm_clock"
- **Class 2:** "ant"
- **Class 3:** "anvil"
- **Class 4:** "asparagus"
- **Class 5:** "axe"
- **Class 6:** "banana"
- **Class 7:** "basket"
- **Class 8:** "bathtub"
- **Class 9:** "bear"
- **Class 10:** "bee"
- **Class 11:** "bird"
- **Class 12:** "blackberry"
- **Class 13:** "blueberry"
- **Class 14:** "bottlecap"
- **Class 15:** "broccoli"
- **Class 16:** "bus"
- **Class 17:** "butterfly"
- **Class 18:** "cactus"
- **Class 19:** "cake"
- **Class 20:** "calculator"
- **Class 21:** "camel"
- **Class 22:** "camera"
- **Class 23:** "candle"
- **Class 24:** "cannon"
- **Class 25:** "canoe"
- **Class 26:** "carrot"
- **Class 27:** "castle"
- **Class 28:** "cat"
- **Class 29:** "ceiling_fan"
- **Class 30:** "cell_phone"
- **Class 31:** "cello"
- **Class 32:** "chair"
- **Class 33:** "chandelier"
- **Class 34:** "coffee_cup"
- **Class 35:** "compass"
- **Class 36:** "computer"
- **Class 37:** "cow"
- **Class 38:** "crab"
- **Class 39:** "crocodile"
- **Class 40:** "cruise_ship"
- **Class 41:** "dog"
- **Class 42:** "dolphin"
- **Class 43:** "dragon"
- **Class 44:** "drums"
- **Class 45:** "duck"
- **Class 46:** "dumbbell"
- **Class 47:** "elephant"
- **Class 48:** "eyeglasses"
- **Class 49:** "feather"
- **Class 50:** "fence"
- **Class 51:** "fish"
- **Class 52:** "flamingo"
- **Class 53:** "flower"
- **Class 54:** "foot"
- **Class 55:** "fork"
- **Class 56:** "frog"
- **Class 57:** "giraffe"
- **Class 58:** "goatee"
- **Class 59:** "grapes"
- **Class 60:** "guitar"
- **Class 61:** "hammer"
- **Class 62:** "helicopter"
- **Class 63:** "helmet"
- **Class 64:** "horse"
- **Class 65:** "kangaroo"
- **Class 66:** "lantern"
- **Class 67:** "laptop"
- **Class 68:** "leaf"
- **Class 69:** "lion"
- **Class 70:** "lipstick"
- **Class 71:** "lobster"
- **Class 72:** "microphone"
- **Class 73:** "monkey"
- **Class 74:** "mosquito"
- **Class 75:** "mouse"
- **Class 76:** "mug"
- **Class 77:** "mushroom"
- **Class 78:** "onion"
- **Class 79:** "panda"
- **Class 80:** "peanut"
- **Class 81:** "pear"
- **Class 82:** "peas"
- **Class 83:** "pencil"
- **Class 84:** "penguin"
- **Class 85:** "pig"
- **Class 86:** "pillow"
- **Class 87:** "pineapple"
- **Class 88:** "potato"
- **Class 89:** "power_outlet"
- **Class 90:** "purse"
- **Class 91:** "rabbit"
- **Class 92:** "raccoon"
- **Class 93:** "rhinoceros"
- **Class 94:** "rifle"
- **Class 95:** "saxophone"
- **Class 96:** "screwdriver"
- **Class 97:** "sea_turtle"
- **Class 98:** "see_saw"
- **Class 99:** "sheep"
- **Class 100:** "shoe"
- **Class 101:** "skateboard"
- **Class 102:** "snake"
- **Class 103:** "speedboat"
- **Class 104:** "spider"
- **Class 105:** "squirrel"
- **Class 106:** "strawberry"
- **Class 107:** "streetlight"
- **Class 108:** "string_bean"
- **Class 109:** "submarine"
- **Class 110:** "swan"
- **Class 111:** "table"
- **Class 112:** "teapot"
- **Class 113:** "teddy-bear"
- **Class 114:** "television"
- **Class 115:** "the_Eiffel_Tower"
- **Class 116:** "the_Great_Wall_of_China"
- **Class 117:** "tiger"
- **Class 118:** "toe"
- **Class 119:** "train"
- **Class 120:** "truck"
- **Class 121:** "umbrella"
- **Class 122:** "vase"
- **Class 123:** "watermelon"
- **Class 124:** "whale"
- **Class 125:** "zebra"
# **Run with Transformers🤗**
```python
!pip install -q transformers torch pillow gradio
```
```python
import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from transformers.image_utils import load_image
from PIL import Image
import torch
# Load model and processor
model_name = "prithivMLmods/Clipart-126-DomainNet"
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)
def clipart_classification(image):
"""Predicts the clipart category for an input image."""
# Convert the input numpy array to a PIL Image and ensure it's in RGB format
image = Image.fromarray(image).convert("RGB")
# Process the image and prepare it for the model
inputs = processor(images=image, return_tensors="pt")
# Perform inference without gradient computation
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
# Apply softmax to obtain probabilities for each class
probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
# Mapping from indices to clipart category labels
labels = {
"0": "aircraft_carrier", "1": "alarm_clock", "2": "ant", "3": "anvil", "4": "asparagus",
"5": "axe", "6": "banana", "7": "basket", "8": "bathtub", "9": "bear",
"10": "bee", "11": "bird", "12": "blackberry", "13": "blueberry", "14": "bottlecap",
"15": "broccoli", "16": "bus", "17": "butterfly", "18": "cactus", "19": "cake",
"20": "calculator", "21": "camel", "22": "camera", "23": "candle", "24": "cannon",
"25": "canoe", "26": "carrot", "27": "castle", "28": "cat", "29": "ceiling_fan",
"30": "cell_phone", "31": "cello", "32": "chair", "33": "chandelier", "34": "coffee_cup",
"35": "compass", "36": "computer", "37": "cow", "38": "crab", "39": "crocodile",
"40": "cruise_ship", "41": "dog", "42": "dolphin", "43": "dragon", "44": "drums",
"45": "duck", "46": "dumbbell", "47": "elephant", "48": "eyeglasses", "49": "feather",
"50": "fence", "51": "fish", "52": "flamingo", "53": "flower", "54": "foot",
"55": "fork", "56": "frog", "57": "giraffe", "58": "goatee", "59": "grapes",
"60": "guitar", "61": "hammer", "62": "helicopter", "63": "helmet", "64": "horse",
"65": "kangaroo", "66": "lantern", "67": "laptop", "68": "leaf", "69": "lion",
"70": "lipstick", "71": "lobster", "72": "microphone", "73": "monkey", "74": "mosquito",
"75": "mouse", "76": "mug", "77": "mushroom", "78": "onion", "79": "panda",
"80": "peanut", "81": "pear", "82": "peas", "83": "pencil", "84": "penguin",
"85": "pig", "86": "pillow", "87": "pineapple", "88": "potato", "89": "power_outlet",
"90": "purse", "91": "rabbit", "92": "raccoon", "93": "rhinoceros", "94": "rifle",
"95": "saxophone", "96": "screwdriver", "97": "sea_turtle", "98": "see_saw", "99": "sheep",
"100": "shoe", "101": "skateboard", "102": "snake", "103": "speedboat", "104": "spider",
"105": "squirrel", "106": "strawberry", "107": "streetlight", "108": "string_bean",
"109": "submarine", "110": "swan", "111": "table", "112": "teapot", "113": "teddy-bear",
"114": "television", "115": "the_Eiffel_Tower", "116": "the_Great_Wall_of_China",
"117": "tiger", "118": "toe", "119": "train", "120": "truck", "121": "umbrella",
"122": "vase", "123": "watermelon", "124": "whale", "125": "zebra"
}
# Create a dictionary mapping each label to its corresponding probability (rounded)
predictions = {labels[str(i)]: round(probs[i], 3) for i in range(len(probs))}
return predictions
# Create Gradio interface
iface = gr.Interface(
fn=clipart_classification,
inputs=gr.Image(type="numpy"),
outputs=gr.Label(label="Prediction Scores"),
title="Clipart-126-DomainNet Classification",
description="Upload a clipart image to classify it into one of 126 domain categories."
)
# Launch the app
if __name__ == "__main__":
iface.launch()
```
---
# **Intended Use:**
The **Clipart-126-DomainNet** model is designed for clipart image classification. It categorizes clipart images into a wide range of domains—from objects like an "aircraft_carrier" or "alarm_clock" to various everyday items. Potential use cases include:
- **Digital Art and Design:** Assisting designers in organizing and retrieving clipart assets.
- **Content Management:** Enhancing digital asset management systems with robust clipart classification.
- **Creative Search Engines:** Enabling clipart-based search for design inspiration and resource curation.
- **Computer Vision Research:** Serving as a benchmark for studies in clipart recognition and domain adaptation. |