Toxic_Classification / README.md

NightPrince

Update README.md

09b2010 verified 7 months ago

preview code

raw

history blame

3.26 kB

metadata

language: en
tags:
  - toxic-content
  - text-classification
  - keras
  - tensorflow
  - deep-learning
  - safety
  - multiclass
license: mit
datasets:
  - custom
metrics:
  - accuracy
  - f1
pipeline_tag: text-classification
model-index:
  - name: Toxic_Classification
    results: []

Toxic_Classification (Keras / TensorFlow Model)

This is a multi-class text classification model for toxic content detection.
It was trained as part of the Cellula Internship - Safe and Responsible Multi-Modal Toxic Content Moderation project.

🚩 Task: Multi-class Toxic Content Detection

The model classifies text (query + image description) into 9 categories:

Label ID	Category
0	Child Sexual Exploitation
1	Elections
2	Non-Violent Crimes
3	Safe
4	Sex-Related Crimes
5	Suicide & Self-Harm
6	Unknown S-Type
7	Violent Crimes
8	Unsafe

✅ Model Details

Framework: TensorFlow 2.19.0 + Keras 3.7.0
Input: Text + Image description (concatenated string)
Tokenizer: JSON tokenizer (tokenizer.json) with OOV handling and vocab size of 10,000
Max Sequence Length: 150 tokens
Output: Softmax probabilities over 9 classes

✅ Files Included in this Repository:

File	Description
`toxic_classifier.keras`	Saved Keras v3 model file
`tokenizer.json`	Keras tokenizer for preprocessing
`config.json`	Model configuration (architecture, vocab size, labels etc)
`requirements.txt`	Python dependencies
`README.md`	This model card

✅ Example Usage (Python):

from keras.saving import load_model
from tensorflow.keras.preprocessing.text import tokenizer_from_json
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np
import json

# Load tokenizer
with open("tokenizer.json", "r", encoding="utf-8") as f:
    tokenizer = tokenizer_from_json(f.read())

# Load model
model = load_model("toxic_classifier.keras")

# Example inference
query = "Example user query"
image_desc = "Image describes a dangerous situation"
text = query + " " + image_desc

sequence = tokenizer.texts_to_sequences([text])
padded = pad_sequences(sequence, maxlen=150, padding='post', truncating='post')

prediction = model.predict(padded)
predicted_label = np.argmax(prediction, axis=1)[0]
print(f"Predicted Label ID: {predicted_label}")



## 📚 Resources

- [Cellula Internship Project Proposal](#)  
- [BLIP: Bootstrapped Language-Image Pre-training](https://github.com/salesforce/BLIP)
- [Llama Guard](https://llama.meta.com/llama-guard/)
- [DistilBERT](https://huggingface.co/distilbert-base-uncased)
- [Streamlit](https://streamlit.io/)

---

## License

MIT License

---

**Author:** Yahya Muhammad Alnwsany  
**Contact:** yahyaalnwsany39@gmail.com  
**Portfolio:** https://nightprincey.github.io/Portfolio/