File size: 3,256 Bytes
09b2010
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
language: en
tags:
- toxic-content
- text-classification
- keras
- tensorflow
- deep-learning
- safety
- multiclass
license: mit
datasets:
- custom
metrics:
- accuracy
- f1
pipeline_tag: text-classification
model-index:
- name: Toxic_Classification
  results: []
---

# Toxic_Classification (Keras / TensorFlow Model)

This is a **multi-class text classification model** for toxic content detection.  
It was trained as part of the **Cellula Internship - Safe and Responsible Multi-Modal Toxic Content Moderation** project.

---

## 🚩 Task: Multi-class Toxic Content Detection

The model classifies text (query + image description) into **9 categories:**

| Label ID | Category                     |
|--------- |------------------------------|
| 0        | Child Sexual Exploitation    |
| 1        | Elections                    |
| 2        | Non-Violent Crimes           |
| 3        | Safe                         |
| 4        | Sex-Related Crimes           |
| 5        | Suicide & Self-Harm          |
| 6        | Unknown S-Type               |
| 7        | Violent Crimes               |
| 8        | Unsafe                       |

---

## ✅ Model Details

- **Framework:** TensorFlow 2.19.0 + Keras 3.7.0
- **Input:** Text + Image description (concatenated string)
- **Tokenizer:** JSON tokenizer (`tokenizer.json`) with OOV handling and vocab size of 10,000
- **Max Sequence Length:** 150 tokens
- **Output:** Softmax probabilities over 9 classes

---

## ✅ Files Included in this Repository:

| File                   | Description                         |
|----------------------- |------------------------------------ |
| `toxic_classifier.keras` | Saved Keras v3 model file |
| `tokenizer.json`       | Keras tokenizer for preprocessing |
| `config.json`          | Model configuration (architecture, vocab size, labels etc) |
| `requirements.txt`     | Python dependencies |
| `README.md`            | This model card |

---

## ✅ Example Usage (Python):

```python
from keras.saving import load_model
from tensorflow.keras.preprocessing.text import tokenizer_from_json
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np
import json

# Load tokenizer
with open("tokenizer.json", "r", encoding="utf-8") as f:
    tokenizer = tokenizer_from_json(f.read())

# Load model
model = load_model("toxic_classifier.keras")

# Example inference
query = "Example user query"
image_desc = "Image describes a dangerous situation"
text = query + " " + image_desc

sequence = tokenizer.texts_to_sequences([text])
padded = pad_sequences(sequence, maxlen=150, padding='post', truncating='post')

prediction = model.predict(padded)
predicted_label = np.argmax(prediction, axis=1)[0]
print(f"Predicted Label ID: {predicted_label}")



## 📚 Resources

- [Cellula Internship Project Proposal](#)  
- [BLIP: Bootstrapped Language-Image Pre-training](https://github.com/salesforce/BLIP)
- [Llama Guard](https://llama.meta.com/llama-guard/)
- [DistilBERT](https://huggingface.co/distilbert-base-uncased)
- [Streamlit](https://streamlit.io/)

---

## License

MIT License

---

**Author:** Yahya Muhammad Alnwsany  
**Contact:** yahyaalnwsany39@gmail.com  
**Portfolio:** https://nightprincey.github.io/Portfolio/