NightPrince's picture
Update README.md
09b2010 verified
|
raw
history blame
3.26 kB
metadata
language: en
tags:
  - toxic-content
  - text-classification
  - keras
  - tensorflow
  - deep-learning
  - safety
  - multiclass
license: mit
datasets:
  - custom
metrics:
  - accuracy
  - f1
pipeline_tag: text-classification
model-index:
  - name: Toxic_Classification
    results: []

Toxic_Classification (Keras / TensorFlow Model)

This is a multi-class text classification model for toxic content detection.
It was trained as part of the Cellula Internship - Safe and Responsible Multi-Modal Toxic Content Moderation project.


🚩 Task: Multi-class Toxic Content Detection

The model classifies text (query + image description) into 9 categories:

Label ID Category
0 Child Sexual Exploitation
1 Elections
2 Non-Violent Crimes
3 Safe
4 Sex-Related Crimes
5 Suicide & Self-Harm
6 Unknown S-Type
7 Violent Crimes
8 Unsafe

βœ… Model Details

  • Framework: TensorFlow 2.19.0 + Keras 3.7.0
  • Input: Text + Image description (concatenated string)
  • Tokenizer: JSON tokenizer (tokenizer.json) with OOV handling and vocab size of 10,000
  • Max Sequence Length: 150 tokens
  • Output: Softmax probabilities over 9 classes

βœ… Files Included in this Repository:

File Description
toxic_classifier.keras Saved Keras v3 model file
tokenizer.json Keras tokenizer for preprocessing
config.json Model configuration (architecture, vocab size, labels etc)
requirements.txt Python dependencies
README.md This model card

βœ… Example Usage (Python):

from keras.saving import load_model
from tensorflow.keras.preprocessing.text import tokenizer_from_json
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np
import json

# Load tokenizer
with open("tokenizer.json", "r", encoding="utf-8") as f:
    tokenizer = tokenizer_from_json(f.read())

# Load model
model = load_model("toxic_classifier.keras")

# Example inference
query = "Example user query"
image_desc = "Image describes a dangerous situation"
text = query + " " + image_desc

sequence = tokenizer.texts_to_sequences([text])
padded = pad_sequences(sequence, maxlen=150, padding='post', truncating='post')

prediction = model.predict(padded)
predicted_label = np.argmax(prediction, axis=1)[0]
print(f"Predicted Label ID: {predicted_label}")



## πŸ“š Resources

- [Cellula Internship Project Proposal](#)  
- [BLIP: Bootstrapped Language-Image Pre-training](https://github.com/salesforce/BLIP)
- [Llama Guard](https://llama.meta.com/llama-guard/)
- [DistilBERT](https://huggingface.co/distilbert-base-uncased)
- [Streamlit](https://streamlit.io/)

---

## License

MIT License

---

**Author:** Yahya Muhammad Alnwsany  
**Contact:** yahyaalnwsany39@gmail.com  
**Portfolio:** https://nightprincey.github.io/Portfolio/