|
|
--- |
|
|
language: en |
|
|
tags: |
|
|
- toxic-content |
|
|
- text-classification |
|
|
- keras |
|
|
- tensorflow |
|
|
- deep-learning |
|
|
- safety |
|
|
- multiclass |
|
|
license: mit |
|
|
datasets: |
|
|
- custom |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
pipeline_tag: text-classification |
|
|
model-index: |
|
|
- name: Toxic_Classification |
|
|
results: [] |
|
|
--- |
|
|
|
|
|
# Toxic_Classification (Keras / TensorFlow Model) |
|
|
|
|
|
This is a **multi-class text classification model** for toxic content detection. |
|
|
It was trained as part of the **Cellula Internship - Safe and Responsible Multi-Modal Toxic Content Moderation** project. |
|
|
|
|
|
--- |
|
|
|
|
|
## π© Task: Multi-class Toxic Content Detection |
|
|
|
|
|
The model classifies text (query + image description) into **9 categories:** |
|
|
|
|
|
| Label ID | Category | |
|
|
|--------- |------------------------------| |
|
|
| 0 | Child Sexual Exploitation | |
|
|
| 1 | Elections | |
|
|
| 2 | Non-Violent Crimes | |
|
|
| 3 | Safe | |
|
|
| 4 | Sex-Related Crimes | |
|
|
| 5 | Suicide & Self-Harm | |
|
|
| 6 | Unknown S-Type | |
|
|
| 7 | Violent Crimes | |
|
|
| 8 | Unsafe | |
|
|
|
|
|
--- |
|
|
|
|
|
## β
Model Details |
|
|
|
|
|
- **Framework:** TensorFlow 2.19.0 + Keras 3.7.0 |
|
|
- **Input:** Text + Image description (concatenated string) |
|
|
- **Tokenizer:** JSON tokenizer (`tokenizer.json`) with OOV handling and vocab size of 10,000 |
|
|
- **Max Sequence Length:** 150 tokens |
|
|
- **Output:** Softmax probabilities over 9 classes |
|
|
|
|
|
--- |
|
|
|
|
|
## β
Files Included in this Repository: |
|
|
|
|
|
| File | Description | |
|
|
|----------------------- |------------------------------------ | |
|
|
| `toxic_classifier.keras` | Saved Keras v3 model file | |
|
|
| `tokenizer.json` | Keras tokenizer for preprocessing | |
|
|
| `config.json` | Model configuration (architecture, vocab size, labels etc) | |
|
|
| `requirements.txt` | Python dependencies | |
|
|
| `README.md` | This model card | |
|
|
|
|
|
--- |
|
|
|
|
|
## β
Example Usage (Python): |
|
|
|
|
|
```python |
|
|
from keras.saving import load_model |
|
|
from tensorflow.keras.preprocessing.text import tokenizer_from_json |
|
|
from tensorflow.keras.preprocessing.sequence import pad_sequences |
|
|
import numpy as np |
|
|
import json |
|
|
|
|
|
# Load tokenizer |
|
|
with open("tokenizer.json", "r", encoding="utf-8") as f: |
|
|
tokenizer = tokenizer_from_json(f.read()) |
|
|
|
|
|
# Load model |
|
|
model = load_model("toxic_classifier.keras") |
|
|
|
|
|
# Example inference |
|
|
query = "Example user query" |
|
|
image_desc = "Image describes a dangerous situation" |
|
|
text = query + " " + image_desc |
|
|
|
|
|
sequence = tokenizer.texts_to_sequences([text]) |
|
|
padded = pad_sequences(sequence, maxlen=150, padding='post', truncating='post') |
|
|
|
|
|
prediction = model.predict(padded) |
|
|
predicted_label = np.argmax(prediction, axis=1)[0] |
|
|
print(f"Predicted Label ID: {predicted_label}") |
|
|
|
|
|
|
|
|
|
|
|
## π Resources |
|
|
|
|
|
- [Cellula Internship Project Proposal](#) |
|
|
- [BLIP: Bootstrapped Language-Image Pre-training](https://github.com/salesforce/BLIP) |
|
|
- [Llama Guard](https://llama.meta.com/llama-guard/) |
|
|
- [DistilBERT](https://huggingface.co/distilbert-base-uncased) |
|
|
- [Streamlit](https://streamlit.io/) |
|
|
|
|
|
--- |
|
|
|
|
|
## License |
|
|
|
|
|
MIT License |
|
|
|
|
|
--- |
|
|
|
|
|
**Author:** Yahya Muhammad Alnwsany |
|
|
**Contact:** yahyaalnwsany39@gmail.com |
|
|
**Portfolio:** https://nightprincey.github.io/Portfolio/ |
|
|
|
|
|
|