elsaelisa09's picture
Upload 6 files
f9b5720 verified

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
title: Self-Harm Detection - Multimodal Model
emoji: πŸ”
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.8.0
app_file: app.py
pinned: false
license: mit

Self-Harm Content Detection - Multimodal Model

This is a multimodal deep learning model that combines image and text analysis to detect potential self-harm content. The model uses a fusion architecture combining CLIP (for image encoding) and ELECTRA (for text encoding) with a Transformer-based fusion layer.

Model Architecture

  • Image Encoder: CLIP (openai/clip-vit-base-patch32)
  • Text Encoder: ELECTRA (sentinet/suicidality)
  • Fusion Layer: 2-layer Transformer encoder
  • Classifier: 3-layer MLP with dropout

Architecture Details

  1. Image Processing:

    • CLIP extracts 512-dimensional image features
    • L2 normalization applied
  2. Text Processing:

    • ELECTRA produces 768-dimensional text embeddings
    • Mean pooling over tokens
    • Projected to 256 dimensions
  3. Fusion:

    • Combined features (768 dims total)
    • Positional embeddings added
    • 2-layer Transformer for cross-modal attention
    • Final representation from image token
  4. Classification:

    • 3-layer MLP: 768 β†’ 384 β†’ 192 β†’ 2
    • GELU activation and dropout for regularization

Usage

Input

  1. Image: Upload an image that may contain text or visual content
  2. Text: Enter the text visible in the image (OCR text)

Output

  • Predicted Label: NON-SELF-HARM or SELF-HARM
  • Confidence Scores: Probability distribution over both classes

Classes

  • NON-SELF-HARM (Class 0): Content without self-harm indicators
  • SELF-HARM (Class 1): Content that may contain self-harm related material

Training Data

The model was trained on a balanced dataset of images with associated text, labeled for self-harm content detection.

Model Performance

The model was trained using:

  • Balanced dataset split (80% train, 20% validation)
  • AdamW optimizer with differential learning rates
  • Cross-entropy loss
  • Early stopping with patience=3

Important Notes

⚠️ Disclaimer: This model is designed for research and educational purposes only. It should not be used as the sole tool for making critical decisions regarding mental health. Always consult with qualified mental health professionals for serious concerns.

Technical Details

Model Parameters

  • Fusion text dimension: 256
  • Max sequence length: 128
  • Image size: 224x224
  • Number of classes: 2

Pretrained Models Used

  • CLIP: openai/clip-vit-base-patch32
  • ELECTRA: sentinet/suicidality

Citation

If you use this model in your research, please cite appropriately.

Contact

For questions or issues, please open an issue in the repository.


Built with πŸ€— Transformers, PyTorch, and Gradio