--- language: en tags: - toxic-content - text-classification - keras - tensorflow - deep-learning - safety - multiclass license: mit datasets: - custom metrics: - accuracy - f1 pipeline_tag: text-classification model-index: - name: Toxic_Classification results: [] --- # Toxic_Classification (Keras / TensorFlow Model) This is a **multi-class text classification model** for toxic content detection. It was trained as part of the **Cellula Internship - Safe and Responsible Multi-Modal Toxic Content Moderation** project. --- ## 🚩 Task: Multi-class Toxic Content Detection The model classifies text (query + image description) into **9 categories:** | Label ID | Category | |--------- |------------------------------| | 0 | Child Sexual Exploitation | | 1 | Elections | | 2 | Non-Violent Crimes | | 3 | Safe | | 4 | Sex-Related Crimes | | 5 | Suicide & Self-Harm | | 6 | Unknown S-Type | | 7 | Violent Crimes | | 8 | Unsafe | --- ## ✅ Model Details - **Framework:** TensorFlow 2.19.0 + Keras 3.7.0 - **Input:** Text + Image description (concatenated string) - **Tokenizer:** JSON tokenizer (`tokenizer.json`) with OOV handling and vocab size of 10,000 - **Max Sequence Length:** 150 tokens - **Output:** Softmax probabilities over 9 classes --- ## ✅ Files Included in this Repository: | File | Description | |----------------------- |------------------------------------ | | `toxic_classifier.keras` | Saved Keras v3 model file | | `tokenizer.json` | Keras tokenizer for preprocessing | | `config.json` | Model configuration (architecture, vocab size, labels etc) | | `requirements.txt` | Python dependencies | | `README.md` | This model card | --- ## ✅ Example Usage (Python): ```python from keras.saving import load_model from tensorflow.keras.preprocessing.text import tokenizer_from_json from tensorflow.keras.preprocessing.sequence import pad_sequences import numpy as np import json # Load tokenizer with open("tokenizer.json", "r", encoding="utf-8") as f: tokenizer = tokenizer_from_json(f.read()) # Load model model = load_model("toxic_classifier.keras") # Example inference query = "Example user query" image_desc = "Image describes a dangerous situation" text = query + " " + image_desc sequence = tokenizer.texts_to_sequences([text]) padded = pad_sequences(sequence, maxlen=150, padding='post', truncating='post') prediction = model.predict(padded) predicted_label = np.argmax(prediction, axis=1)[0] print(f"Predicted Label ID: {predicted_label}") ## 📚 Resources - [Cellula Internship Project Proposal](#) - [BLIP: Bootstrapped Language-Image Pre-training](https://github.com/salesforce/BLIP) - [Llama Guard](https://llama.meta.com/llama-guard/) - [DistilBERT](https://huggingface.co/distilbert-base-uncased) - [Streamlit](https://streamlit.io/) --- ## License MIT License --- **Author:** Yahya Muhammad Alnwsany **Contact:** yahyaalnwsany39@gmail.com **Portfolio:** https://nightprincey.github.io/Portfolio/