File size: 3,256 Bytes
09b2010 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 | ---
language: en
tags:
- toxic-content
- text-classification
- keras
- tensorflow
- deep-learning
- safety
- multiclass
license: mit
datasets:
- custom
metrics:
- accuracy
- f1
pipeline_tag: text-classification
model-index:
- name: Toxic_Classification
results: []
---
# Toxic_Classification (Keras / TensorFlow Model)
This is a **multi-class text classification model** for toxic content detection.
It was trained as part of the **Cellula Internship - Safe and Responsible Multi-Modal Toxic Content Moderation** project.
---
## 🚩 Task: Multi-class Toxic Content Detection
The model classifies text (query + image description) into **9 categories:**
| Label ID | Category |
|--------- |------------------------------|
| 0 | Child Sexual Exploitation |
| 1 | Elections |
| 2 | Non-Violent Crimes |
| 3 | Safe |
| 4 | Sex-Related Crimes |
| 5 | Suicide & Self-Harm |
| 6 | Unknown S-Type |
| 7 | Violent Crimes |
| 8 | Unsafe |
---
## ✅ Model Details
- **Framework:** TensorFlow 2.19.0 + Keras 3.7.0
- **Input:** Text + Image description (concatenated string)
- **Tokenizer:** JSON tokenizer (`tokenizer.json`) with OOV handling and vocab size of 10,000
- **Max Sequence Length:** 150 tokens
- **Output:** Softmax probabilities over 9 classes
---
## ✅ Files Included in this Repository:
| File | Description |
|----------------------- |------------------------------------ |
| `toxic_classifier.keras` | Saved Keras v3 model file |
| `tokenizer.json` | Keras tokenizer for preprocessing |
| `config.json` | Model configuration (architecture, vocab size, labels etc) |
| `requirements.txt` | Python dependencies |
| `README.md` | This model card |
---
## ✅ Example Usage (Python):
```python
from keras.saving import load_model
from tensorflow.keras.preprocessing.text import tokenizer_from_json
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np
import json
# Load tokenizer
with open("tokenizer.json", "r", encoding="utf-8") as f:
tokenizer = tokenizer_from_json(f.read())
# Load model
model = load_model("toxic_classifier.keras")
# Example inference
query = "Example user query"
image_desc = "Image describes a dangerous situation"
text = query + " " + image_desc
sequence = tokenizer.texts_to_sequences([text])
padded = pad_sequences(sequence, maxlen=150, padding='post', truncating='post')
prediction = model.predict(padded)
predicted_label = np.argmax(prediction, axis=1)[0]
print(f"Predicted Label ID: {predicted_label}")
## 📚 Resources
- [Cellula Internship Project Proposal](#)
- [BLIP: Bootstrapped Language-Image Pre-training](https://github.com/salesforce/BLIP)
- [Llama Guard](https://llama.meta.com/llama-guard/)
- [DistilBERT](https://huggingface.co/distilbert-base-uncased)
- [Streamlit](https://streamlit.io/)
---
## License
MIT License
---
**Author:** Yahya Muhammad Alnwsany
**Contact:** yahyaalnwsany39@gmail.com
**Portfolio:** https://nightprincey.github.io/Portfolio/
|