NightPrince
/

Toxic_Classification

Text Classification

Model card Files Files and versions

Toxic_Classification / README.md

NightPrince's picture

Update README.md

09b2010 verified 7 months ago

|

3.26 kB

	---
	language: en
	tags:
	- toxic-content
	- text-classification
	- keras
	- tensorflow
	- deep-learning
	- safety
	- multiclass
	license: mit
	datasets:
	- custom
	metrics:
	- accuracy
	- f1
	pipeline_tag: text-classification
	model-index:
	- name: Toxic_Classification
	results: []
	---

	# Toxic_Classification (Keras / TensorFlow Model)

	This is a multi-class text classification model for toxic content detection.
	It was trained as part of the Cellula Internship - Safe and Responsible Multi-Modal Toxic Content Moderation project.

	---

	## 🚩 Task: Multi-class Toxic Content Detection

	The model classifies text (query + image description) into 9 categories:

	\| Label ID \| Category \|
	\|--------- \|------------------------------\|
	\| 0 \| Child Sexual Exploitation \|
	\| 1 \| Elections \|
	\| 2 \| Non-Violent Crimes \|
	\| 3 \| Safe \|
	\| 4 \| Sex-Related Crimes \|
	\| 5 \| Suicide & Self-Harm \|
	\| 6 \| Unknown S-Type \|
	\| 7 \| Violent Crimes \|
	\| 8 \| Unsafe \|

	---

	## ✅ Model Details

	- Framework: TensorFlow 2.19.0 + Keras 3.7.0
	- Input: Text + Image description (concatenated string)
	- Tokenizer: JSON tokenizer (`tokenizer.json`) with OOV handling and vocab size of 10,000
	- Max Sequence Length: 150 tokens
	- Output: Softmax probabilities over 9 classes

	---

	## ✅ Files Included in this Repository:

	\| File \| Description \|
	\|----------------------- \|------------------------------------ \|
	\| `toxic_classifier.keras` \| Saved Keras v3 model file \|
	\| `tokenizer.json` \| Keras tokenizer for preprocessing \|
	\| `config.json` \| Model configuration (architecture, vocab size, labels etc) \|
	\| `requirements.txt` \| Python dependencies \|
	\| `README.md` \| This model card \|

	---

	## ✅ Example Usage (Python):

	```python
	from keras.saving import load_model
	from tensorflow.keras.preprocessing.text import tokenizer_from_json
	from tensorflow.keras.preprocessing.sequence import pad_sequences
	import numpy as np
	import json

	# Load tokenizer
	with open("tokenizer.json", "r", encoding="utf-8") as f:
	tokenizer = tokenizer_from_json(f.read())

	# Load model
	model = load_model("toxic_classifier.keras")

	# Example inference
	query = "Example user query"
	image_desc = "Image describes a dangerous situation"
	text = query + " " + image_desc

	sequence = tokenizer.texts_to_sequences([text])
	padded = pad_sequences(sequence, maxlen=150, padding='post', truncating='post')

	prediction = model.predict(padded)
	predicted_label = np.argmax(prediction, axis=1)[0]
	print(f"Predicted Label ID: {predicted_label}")



	## 📚 Resources

	- [Cellula Internship Project Proposal](#)
	- [BLIP: Bootstrapped Language-Image Pre-training](https://github.com/salesforce/BLIP)
	- [Llama Guard](https://llama.meta.com/llama-guard/)
	- [DistilBERT](https://huggingface.co/distilbert-base-uncased)
	- [Streamlit](https://streamlit.io/)

	---

	## License

	MIT License

	---

	Author: Yahya Muhammad Alnwsany
	Contact: yahyaalnwsany39@gmail.com
	Portfolio: https://nightprincey.github.io/Portfolio/