Spaces:
Build error
A newer version of the Gradio SDK is available: 6.13.0
title: Self-Harm Detection - Multimodal Model
emoji: π
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.8.0
app_file: app.py
pinned: false
license: mit
Self-Harm Content Detection - Multimodal Model
This is a multimodal deep learning model that combines image and text analysis to detect potential self-harm content. The model uses a fusion architecture combining CLIP (for image encoding) and ELECTRA (for text encoding) with a Transformer-based fusion layer.
Model Architecture
- Image Encoder: CLIP (openai/clip-vit-base-patch32)
- Text Encoder: ELECTRA (sentinet/suicidality)
- Fusion Layer: 2-layer Transformer encoder
- Classifier: 3-layer MLP with dropout
Architecture Details
Image Processing:
- CLIP extracts 512-dimensional image features
- L2 normalization applied
Text Processing:
- ELECTRA produces 768-dimensional text embeddings
- Mean pooling over tokens
- Projected to 256 dimensions
Fusion:
- Combined features (768 dims total)
- Positional embeddings added
- 2-layer Transformer for cross-modal attention
- Final representation from image token
Classification:
- 3-layer MLP: 768 β 384 β 192 β 2
- GELU activation and dropout for regularization
Usage
Input
- Image: Upload an image that may contain text or visual content
- Text: Enter the text visible in the image (OCR text)
Output
- Predicted Label: NON-SELF-HARM or SELF-HARM
- Confidence Scores: Probability distribution over both classes
Classes
- NON-SELF-HARM (Class 0): Content without self-harm indicators
- SELF-HARM (Class 1): Content that may contain self-harm related material
Training Data
The model was trained on a balanced dataset of images with associated text, labeled for self-harm content detection.
Model Performance
The model was trained using:
- Balanced dataset split (80% train, 20% validation)
- AdamW optimizer with differential learning rates
- Cross-entropy loss
- Early stopping with patience=3
Important Notes
β οΈ Disclaimer: This model is designed for research and educational purposes only. It should not be used as the sole tool for making critical decisions regarding mental health. Always consult with qualified mental health professionals for serious concerns.
Technical Details
Model Parameters
- Fusion text dimension: 256
- Max sequence length: 128
- Image size: 224x224
- Number of classes: 2
Pretrained Models Used
- CLIP:
openai/clip-vit-base-patch32 - ELECTRA:
sentinet/suicidality
Citation
If you use this model in your research, please cite appropriately.
Contact
For questions or issues, please open an issue in the repository.
Built with π€ Transformers, PyTorch, and Gradio