metadata
language: en
tags:
- toxic-content
- text-classification
- keras
- tensorflow
- deep-learning
- safety
- multiclass
license: mit
datasets:
- custom
metrics:
- accuracy
- f1
pipeline_tag: text-classification
model-index:
- name: Toxic_Classification
results: []
Toxic_Classification (Keras / TensorFlow Model)
This is a multi-class text classification model for toxic content detection.
It was trained as part of the Cellula Internship - Safe and Responsible Multi-Modal Toxic Content Moderation project.
π© Task: Multi-class Toxic Content Detection
The model classifies text (query + image description) into 9 categories:
| Label ID | Category |
|---|---|
| 0 | Child Sexual Exploitation |
| 1 | Elections |
| 2 | Non-Violent Crimes |
| 3 | Safe |
| 4 | Sex-Related Crimes |
| 5 | Suicide & Self-Harm |
| 6 | Unknown S-Type |
| 7 | Violent Crimes |
| 8 | Unsafe |
β Model Details
- Framework: TensorFlow 2.19.0 + Keras 3.7.0
- Input: Text + Image description (concatenated string)
- Tokenizer: JSON tokenizer (
tokenizer.json) with OOV handling and vocab size of 10,000 - Max Sequence Length: 150 tokens
- Output: Softmax probabilities over 9 classes
β Files Included in this Repository:
| File | Description |
|---|---|
toxic_classifier.keras |
Saved Keras v3 model file |
tokenizer.json |
Keras tokenizer for preprocessing |
config.json |
Model configuration (architecture, vocab size, labels etc) |
requirements.txt |
Python dependencies |
README.md |
This model card |
β Example Usage (Python):
from keras.saving import load_model
from tensorflow.keras.preprocessing.text import tokenizer_from_json
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np
import json
# Load tokenizer
with open("tokenizer.json", "r", encoding="utf-8") as f:
tokenizer = tokenizer_from_json(f.read())
# Load model
model = load_model("toxic_classifier.keras")
# Example inference
query = "Example user query"
image_desc = "Image describes a dangerous situation"
text = query + " " + image_desc
sequence = tokenizer.texts_to_sequences([text])
padded = pad_sequences(sequence, maxlen=150, padding='post', truncating='post')
prediction = model.predict(padded)
predicted_label = np.argmax(prediction, axis=1)[0]
print(f"Predicted Label ID: {predicted_label}")
## π Resources
- [Cellula Internship Project Proposal](#)
- [BLIP: Bootstrapped Language-Image Pre-training](https://github.com/salesforce/BLIP)
- [Llama Guard](https://llama.meta.com/llama-guard/)
- [DistilBERT](https://huggingface.co/distilbert-base-uncased)
- [Streamlit](https://streamlit.io/)
---
## License
MIT License
---
**Author:** Yahya Muhammad Alnwsany
**Contact:** yahyaalnwsany39@gmail.com
**Portfolio:** https://nightprincey.github.io/Portfolio/