NightPrince's picture
Update README.md
09b2010 verified
|
raw
history blame
3.26 kB
---
language: en
tags:
- toxic-content
- text-classification
- keras
- tensorflow
- deep-learning
- safety
- multiclass
license: mit
datasets:
- custom
metrics:
- accuracy
- f1
pipeline_tag: text-classification
model-index:
- name: Toxic_Classification
results: []
---
# Toxic_Classification (Keras / TensorFlow Model)
This is a **multi-class text classification model** for toxic content detection.
It was trained as part of the **Cellula Internship - Safe and Responsible Multi-Modal Toxic Content Moderation** project.
---
## 🚩 Task: Multi-class Toxic Content Detection
The model classifies text (query + image description) into **9 categories:**
| Label ID | Category |
|--------- |------------------------------|
| 0 | Child Sexual Exploitation |
| 1 | Elections |
| 2 | Non-Violent Crimes |
| 3 | Safe |
| 4 | Sex-Related Crimes |
| 5 | Suicide & Self-Harm |
| 6 | Unknown S-Type |
| 7 | Violent Crimes |
| 8 | Unsafe |
---
## βœ… Model Details
- **Framework:** TensorFlow 2.19.0 + Keras 3.7.0
- **Input:** Text + Image description (concatenated string)
- **Tokenizer:** JSON tokenizer (`tokenizer.json`) with OOV handling and vocab size of 10,000
- **Max Sequence Length:** 150 tokens
- **Output:** Softmax probabilities over 9 classes
---
## βœ… Files Included in this Repository:
| File | Description |
|----------------------- |------------------------------------ |
| `toxic_classifier.keras` | Saved Keras v3 model file |
| `tokenizer.json` | Keras tokenizer for preprocessing |
| `config.json` | Model configuration (architecture, vocab size, labels etc) |
| `requirements.txt` | Python dependencies |
| `README.md` | This model card |
---
## βœ… Example Usage (Python):
```python
from keras.saving import load_model
from tensorflow.keras.preprocessing.text import tokenizer_from_json
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np
import json
# Load tokenizer
with open("tokenizer.json", "r", encoding="utf-8") as f:
tokenizer = tokenizer_from_json(f.read())
# Load model
model = load_model("toxic_classifier.keras")
# Example inference
query = "Example user query"
image_desc = "Image describes a dangerous situation"
text = query + " " + image_desc
sequence = tokenizer.texts_to_sequences([text])
padded = pad_sequences(sequence, maxlen=150, padding='post', truncating='post')
prediction = model.predict(padded)
predicted_label = np.argmax(prediction, axis=1)[0]
print(f"Predicted Label ID: {predicted_label}")
## πŸ“š Resources
- [Cellula Internship Project Proposal](#)
- [BLIP: Bootstrapped Language-Image Pre-training](https://github.com/salesforce/BLIP)
- [Llama Guard](https://llama.meta.com/llama-guard/)
- [DistilBERT](https://huggingface.co/distilbert-base-uncased)
- [Streamlit](https://streamlit.io/)
---
## License
MIT License
---
**Author:** Yahya Muhammad Alnwsany
**Contact:** yahyaalnwsany39@gmail.com
**Portfolio:** https://nightprincey.github.io/Portfolio/