File size: 5,185 Bytes

---
license: apache-2.0
datasets:
- cj-mills/hagrid-classification-512p-no-gesture-150k
language:
- en
base_model:
- google/siglip2-so400m-patch14-384
pipeline_tag: image-classification
library_name: transformers
tags:
- Gesture
- Classification
- SigLIP2
- 19:Styles
- Vision-Encoder
---

![15.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/JBoqEwRBoOQwik0aRYeGw.png)

# **Hand-Gesture-19**  

> **Hand-Gesture-19** is an image classification vision-language encoder model fine-tuned from **google/siglip2-base-patch16-224** for a single-label classification task. It is designed to classify hand gesture images into different categories using the **SiglipForImageClassification** architecture.  

```py
Classification Report:
                 precision    recall  f1-score   support

           call     0.9889    0.9739    0.9813      6939
        dislike     0.9892    0.9863    0.9877      7028
           fist     0.9956    0.9923    0.9940      6882
           four     0.9632    0.9653    0.9643      7183
           like     0.9668    0.9855    0.9760      6823
           mute     0.9848    0.9976    0.9912      7139
     no_gesture     0.9960    0.9957    0.9958     27823
             ok     0.9872    0.9831    0.9852      6924
            one     0.9817    0.9854    0.9835      7062
           palm     0.9793    0.9848    0.9820      7050
          peace     0.9723    0.9635    0.9679      6965
 peace_inverted     0.9806    0.9836    0.9821      6876
           rock     0.9853    0.9865    0.9859      6883
           stop     0.9614    0.9901    0.9756      6893
  stop_inverted     0.9933    0.9712    0.9821      7142
          three     0.9712    0.9478    0.9594      6940
         three2     0.9785    0.9799    0.9792      6870
         two_up     0.9848    0.9863    0.9855      7346
two_up_inverted     0.9855    0.9871    0.9863      6967

       accuracy                         0.9833    153735
      macro avg     0.9813    0.9814    0.9813    153735
   weighted avg     0.9833    0.9833    0.9833    153735
```

![download (2).png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/BhwQi6V5Qzl3g33OvRsWz.png)


The model categorizes images into nineteen hand gestures:  
- **Class 0:** "call"  
- **Class 1:** "dislike"  
- **Class 2:** "fist"  
- **Class 3:** "four"  
- **Class 4:** "like"  
- **Class 5:** "mute"  
- **Class 6:** "no_gesture"  
- **Class 7:** "ok"  
- **Class 8:** "one"  
- **Class 9:** "palm"  
- **Class 10:** "peace"  
- **Class 11:** "peace_inverted"  
- **Class 12:** "rock"  
- **Class 13:** "stop"  
- **Class 14:** "stop_inverted"  
- **Class 15:** "three"  
- **Class 16:** "three2"  
- **Class 17:** "two_up"  
- **Class 18:** "two_up_inverted"  

# **Run with Transformers🤗**  

```python
!pip install -q transformers torch pillow gradio
```

```python
import gradio as gr
from transformers import AutoImageProcessor
from transformers import SiglipForImageClassification
from transformers.image_utils import load_image
from PIL import Image
import torch

# Load model and processor
model_name = "prithivMLmods/Hand-Gesture-19"
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)

def hand_gesture_classification(image):
    """Predicts the hand gesture category from an image."""
    image = Image.fromarray(image).convert("RGB")
    inputs = processor(images=image, return_tensors="pt")
    
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
    
    labels = {
        "0": "call", 
        "1": "dislike", 
        "2": "fist", 
        "3": "four", 
        "4": "like", 
        "5": "mute", 
        "6": "no_gesture", 
        "7": "ok", 
        "8": "one", 
        "9": "palm", 
        "10": "peace", 
        "11": "peace_inverted", 
        "12": "rock", 
        "13": "stop", 
        "14": "stop_inverted", 
        "15": "three", 
        "16": "three2", 
        "17": "two_up", 
        "18": "two_up_inverted"
    }
    predictions = {labels[str(i)]: round(probs[i], 3) for i in range(len(probs))}
    
    return predictions

# Create Gradio interface
iface = gr.Interface(
    fn=hand_gesture_classification,
    inputs=gr.Image(type="numpy"),
    outputs=gr.Label(label="Prediction Scores"),
    title="Hand Gesture Classification",
    description="Upload an image to classify the hand gesture."
)

# Launch the app
if __name__ == "__main__":
    iface.launch()
```  

# **Intended Use:**  

The **Hand-Gesture-19** model is designed to classify hand gesture images into different categories. Potential use cases include:  

- **Human-Computer Interaction:** Enabling gesture-based controls for devices.  
- **Sign Language Interpretation:** Assisting in recognizing sign language gestures.  
- **Gaming & VR:** Enhancing immersive experiences with hand gesture recognition.  
- **Robotics:** Facilitating gesture-based robotic control.  
- **Security & Surveillance:** Identifying gestures for access control and safety monitoring.