Karthik Raghu
Update README.md (#1)
61c9e6a verified
|
raw
history blame
3.3 kB
metadata
license: apache-2.0
language:
  - en
base_model:
  - google/siglip2-base-patch16-224
pipeline_tag: image-classification
library_name: transformers
tags:
  - text-generation-inference

appy-monkey-local-96.07

appy-monkey-local-96.07 is a vision-language encoder model fine-tuned from siglip2-base-patch16-224 for binary image classification. The model is built for game content moderation, distinguishing between safe (good) and unsafe (bad) visual content. It leverages the SiglipForImageClassification architecture for effective visual understanding.

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features https://arxiv.org/pdf/2502.14786

Classification Report:
              precision    recall  f1-score   support

         bad     0.9814    0.9339    0.9571      1755
        good     0.9439    0.9844    0.9637      1983

Accuracy: 0.9607
F1 Score: 0.9604

Figure_1.png


Label Space: 2 Classes

Class 0: bad (Unsafe content)  
Class 1: good (Safe content)

Install Dependencies

pip install -q transformers torch pillow gradio hf_xet

Inference Code

import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch

# Load model and processor
model_name = "prithivMLmods/appy-monkey-local-96.07"  # Update with the actual model repo if needed
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)

# Binary label mapping
id2label = {
    "0": "bad",   # Unsafe content
    "1": "good"   # Safe content
}

def classify_image(image):
    image = Image.fromarray(image).convert("RGB")
    inputs = processor(images=image, return_tensors="pt")

    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()

    prediction = {
        id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
    }

    return prediction

# Gradio Interface
iface = gr.Interface(
    fn=classify_image,
    inputs=gr.Image(type="numpy"),
    outputs=gr.Label(num_top_classes=2, label="Game Content Moderation"),
    title="Appy-Monkey: Game Content Classifier",
    description="Upload a game image or screenshot to classify whether the content is Safe (good) or Unsafe (bad)."
)

if __name__ == "__main__":
    iface.launch()

Intended Use

appy-monkey-local-96.07 is designed for:

  • Game Content Moderation – Automatically screens in-game visuals or user-submitted content.
  • Parental Control – Identifies inappropriate or unsafe images in children's game environments.
  • Platform Safety Enforcement – Supports automated moderation for online multiplayer platforms and game forums.
  • AI Research & Development – A benchmark for applying vision-language models in safety-critical applications.
  • Community Standards Compliance – Promotes visual content integrity aligned with safety policies.