NightPrince commited on
Commit
09b2010
·
verified ·
1 Parent(s): 0ebf33b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +119 -3
README.md CHANGED
@@ -1,3 +1,119 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - toxic-content
5
+ - text-classification
6
+ - keras
7
+ - tensorflow
8
+ - deep-learning
9
+ - safety
10
+ - multiclass
11
+ license: mit
12
+ datasets:
13
+ - custom
14
+ metrics:
15
+ - accuracy
16
+ - f1
17
+ pipeline_tag: text-classification
18
+ model-index:
19
+ - name: Toxic_Classification
20
+ results: []
21
+ ---
22
+
23
+ # Toxic_Classification (Keras / TensorFlow Model)
24
+
25
+ This is a **multi-class text classification model** for toxic content detection.
26
+ It was trained as part of the **Cellula Internship - Safe and Responsible Multi-Modal Toxic Content Moderation** project.
27
+
28
+ ---
29
+
30
+ ## 🚩 Task: Multi-class Toxic Content Detection
31
+
32
+ The model classifies text (query + image description) into **9 categories:**
33
+
34
+ | Label ID | Category |
35
+ |--------- |------------------------------|
36
+ | 0 | Child Sexual Exploitation |
37
+ | 1 | Elections |
38
+ | 2 | Non-Violent Crimes |
39
+ | 3 | Safe |
40
+ | 4 | Sex-Related Crimes |
41
+ | 5 | Suicide & Self-Harm |
42
+ | 6 | Unknown S-Type |
43
+ | 7 | Violent Crimes |
44
+ | 8 | Unsafe |
45
+
46
+ ---
47
+
48
+ ## ✅ Model Details
49
+
50
+ - **Framework:** TensorFlow 2.19.0 + Keras 3.7.0
51
+ - **Input:** Text + Image description (concatenated string)
52
+ - **Tokenizer:** JSON tokenizer (`tokenizer.json`) with OOV handling and vocab size of 10,000
53
+ - **Max Sequence Length:** 150 tokens
54
+ - **Output:** Softmax probabilities over 9 classes
55
+
56
+ ---
57
+
58
+ ## ✅ Files Included in this Repository:
59
+
60
+ | File | Description |
61
+ |----------------------- |------------------------------------ |
62
+ | `toxic_classifier.keras` | Saved Keras v3 model file |
63
+ | `tokenizer.json` | Keras tokenizer for preprocessing |
64
+ | `config.json` | Model configuration (architecture, vocab size, labels etc) |
65
+ | `requirements.txt` | Python dependencies |
66
+ | `README.md` | This model card |
67
+
68
+ ---
69
+
70
+ ## ✅ Example Usage (Python):
71
+
72
+ ```python
73
+ from keras.saving import load_model
74
+ from tensorflow.keras.preprocessing.text import tokenizer_from_json
75
+ from tensorflow.keras.preprocessing.sequence import pad_sequences
76
+ import numpy as np
77
+ import json
78
+
79
+ # Load tokenizer
80
+ with open("tokenizer.json", "r", encoding="utf-8") as f:
81
+ tokenizer = tokenizer_from_json(f.read())
82
+
83
+ # Load model
84
+ model = load_model("toxic_classifier.keras")
85
+
86
+ # Example inference
87
+ query = "Example user query"
88
+ image_desc = "Image describes a dangerous situation"
89
+ text = query + " " + image_desc
90
+
91
+ sequence = tokenizer.texts_to_sequences([text])
92
+ padded = pad_sequences(sequence, maxlen=150, padding='post', truncating='post')
93
+
94
+ prediction = model.predict(padded)
95
+ predicted_label = np.argmax(prediction, axis=1)[0]
96
+ print(f"Predicted Label ID: {predicted_label}")
97
+
98
+
99
+
100
+ ## 📚 Resources
101
+
102
+ - [Cellula Internship Project Proposal](#)
103
+ - [BLIP: Bootstrapped Language-Image Pre-training](https://github.com/salesforce/BLIP)
104
+ - [Llama Guard](https://llama.meta.com/llama-guard/)
105
+ - [DistilBERT](https://huggingface.co/distilbert-base-uncased)
106
+ - [Streamlit](https://streamlit.io/)
107
+
108
+ ---
109
+
110
+ ## License
111
+
112
+ MIT License
113
+
114
+ ---
115
+
116
+ **Author:** Yahya Muhammad Alnwsany
117
+ **Contact:** yahyaalnwsany39@gmail.com
118
+ **Portfolio:** https://nightprincey.github.io/Portfolio/
119
+