File size: 5,185 Bytes
533e324
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
692c57d
 
329aaef
 
 
 
 
 
692c57d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
329aaef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
---
license: apache-2.0
datasets:
- cj-mills/hagrid-classification-512p-no-gesture-150k
language:
- en
base_model:
- google/siglip2-so400m-patch14-384
pipeline_tag: image-classification
library_name: transformers
tags:
- Gesture
- Classification
- SigLIP2
- 19:Styles
- Vision-Encoder
---

![15.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/JBoqEwRBoOQwik0aRYeGw.png)

# **Hand-Gesture-19**  

> **Hand-Gesture-19** is an image classification vision-language encoder model fine-tuned from **google/siglip2-base-patch16-224** for a single-label classification task. It is designed to classify hand gesture images into different categories using the **SiglipForImageClassification** architecture.  

```py
Classification Report:
                 precision    recall  f1-score   support

           call     0.9889    0.9739    0.9813      6939
        dislike     0.9892    0.9863    0.9877      7028
           fist     0.9956    0.9923    0.9940      6882
           four     0.9632    0.9653    0.9643      7183
           like     0.9668    0.9855    0.9760      6823
           mute     0.9848    0.9976    0.9912      7139
     no_gesture     0.9960    0.9957    0.9958     27823
             ok     0.9872    0.9831    0.9852      6924
            one     0.9817    0.9854    0.9835      7062
           palm     0.9793    0.9848    0.9820      7050
          peace     0.9723    0.9635    0.9679      6965
 peace_inverted     0.9806    0.9836    0.9821      6876
           rock     0.9853    0.9865    0.9859      6883
           stop     0.9614    0.9901    0.9756      6893
  stop_inverted     0.9933    0.9712    0.9821      7142
          three     0.9712    0.9478    0.9594      6940
         three2     0.9785    0.9799    0.9792      6870
         two_up     0.9848    0.9863    0.9855      7346
two_up_inverted     0.9855    0.9871    0.9863      6967

       accuracy                         0.9833    153735
      macro avg     0.9813    0.9814    0.9813    153735
   weighted avg     0.9833    0.9833    0.9833    153735
```

![download (2).png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/BhwQi6V5Qzl3g33OvRsWz.png)


The model categorizes images into nineteen hand gestures:  
- **Class 0:** "call"  
- **Class 1:** "dislike"  
- **Class 2:** "fist"  
- **Class 3:** "four"  
- **Class 4:** "like"  
- **Class 5:** "mute"  
- **Class 6:** "no_gesture"  
- **Class 7:** "ok"  
- **Class 8:** "one"  
- **Class 9:** "palm"  
- **Class 10:** "peace"  
- **Class 11:** "peace_inverted"  
- **Class 12:** "rock"  
- **Class 13:** "stop"  
- **Class 14:** "stop_inverted"  
- **Class 15:** "three"  
- **Class 16:** "three2"  
- **Class 17:** "two_up"  
- **Class 18:** "two_up_inverted"  

# **Run with Transformers🤗**  

```python
!pip install -q transformers torch pillow gradio
```

```python
import gradio as gr
from transformers import AutoImageProcessor
from transformers import SiglipForImageClassification
from transformers.image_utils import load_image
from PIL import Image
import torch

# Load model and processor
model_name = "prithivMLmods/Hand-Gesture-19"
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)

def hand_gesture_classification(image):
    """Predicts the hand gesture category from an image."""
    image = Image.fromarray(image).convert("RGB")
    inputs = processor(images=image, return_tensors="pt")
    
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
    
    labels = {
        "0": "call", 
        "1": "dislike", 
        "2": "fist", 
        "3": "four", 
        "4": "like", 
        "5": "mute", 
        "6": "no_gesture", 
        "7": "ok", 
        "8": "one", 
        "9": "palm", 
        "10": "peace", 
        "11": "peace_inverted", 
        "12": "rock", 
        "13": "stop", 
        "14": "stop_inverted", 
        "15": "three", 
        "16": "three2", 
        "17": "two_up", 
        "18": "two_up_inverted"
    }
    predictions = {labels[str(i)]: round(probs[i], 3) for i in range(len(probs))}
    
    return predictions

# Create Gradio interface
iface = gr.Interface(
    fn=hand_gesture_classification,
    inputs=gr.Image(type="numpy"),
    outputs=gr.Label(label="Prediction Scores"),
    title="Hand Gesture Classification",
    description="Upload an image to classify the hand gesture."
)

# Launch the app
if __name__ == "__main__":
    iface.launch()
```  

# **Intended Use:**  

The **Hand-Gesture-19** model is designed to classify hand gesture images into different categories. Potential use cases include:  

- **Human-Computer Interaction:** Enabling gesture-based controls for devices.  
- **Sign Language Interpretation:** Assisting in recognizing sign language gestures.  
- **Gaming & VR:** Enhancing immersive experiences with hand gesture recognition.  
- **Robotics:** Facilitating gesture-based robotic control.  
- **Security & Surveillance:** Identifying gestures for access control and safety monitoring.