github-actions commited on
Commit
4ff97c3
·
1 Parent(s): b5ce60a

Update Space

Browse files
Files changed (5) hide show
  1. .gitattributes +0 -35
  2. README.MD +66 -0
  3. README.md +0 -14
  4. app.py +46 -0
  5. requirements.txt +4 -0
.gitattributes DELETED
@@ -1,35 +0,0 @@
1
- *.7z filter=lfs diff=lfs merge=lfs -text
2
- *.arrow filter=lfs diff=lfs merge=lfs -text
3
- *.bin filter=lfs diff=lfs merge=lfs -text
4
- *.bz2 filter=lfs diff=lfs merge=lfs -text
5
- *.ckpt filter=lfs diff=lfs merge=lfs -text
6
- *.ftz filter=lfs diff=lfs merge=lfs -text
7
- *.gz filter=lfs diff=lfs merge=lfs -text
8
- *.h5 filter=lfs diff=lfs merge=lfs -text
9
- *.joblib filter=lfs diff=lfs merge=lfs -text
10
- *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
- *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
- *.model filter=lfs diff=lfs merge=lfs -text
13
- *.msgpack filter=lfs diff=lfs merge=lfs -text
14
- *.npy filter=lfs diff=lfs merge=lfs -text
15
- *.npz filter=lfs diff=lfs merge=lfs -text
16
- *.onnx filter=lfs diff=lfs merge=lfs -text
17
- *.ot filter=lfs diff=lfs merge=lfs -text
18
- *.parquet filter=lfs diff=lfs merge=lfs -text
19
- *.pb filter=lfs diff=lfs merge=lfs -text
20
- *.pickle filter=lfs diff=lfs merge=lfs -text
21
- *.pkl filter=lfs diff=lfs merge=lfs -text
22
- *.pt filter=lfs diff=lfs merge=lfs -text
23
- *.pth filter=lfs diff=lfs merge=lfs -text
24
- *.rar filter=lfs diff=lfs merge=lfs -text
25
- *.safetensors filter=lfs diff=lfs merge=lfs -text
26
- saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
- *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
- *.tflite filter=lfs diff=lfs merge=lfs -text
30
- *.tgz filter=lfs diff=lfs merge=lfs -text
31
- *.wasm filter=lfs diff=lfs merge=lfs -text
32
- *.xz filter=lfs diff=lfs merge=lfs -text
33
- *.zip filter=lfs diff=lfs merge=lfs -text
34
- *.zst filter=lfs diff=lfs merge=lfs -text
35
- *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.MD ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Toxic Comment Detector
3
+ emoji: 🚨
4
+ colorFrom: red
5
+ colorTo: yellow
6
+ sdk: gradio
7
+ sdk_version: "4.44.0"
8
+ python_version: "3.10"
9
+ app_file: app.py
10
+ pinned: false
11
+ ---
12
+
13
+ # Toxic Comment Detection
14
+
15
+ This Space demonstrates a **toxic comment detection system** built using
16
+ **classical machine learning techniques** and deployed with **Gradio**.
17
+
18
+ ---
19
+
20
+ ## Overview
21
+
22
+ Given a text comment, the model predicts whether it is:
23
+ - Toxic
24
+ - Non-toxic
25
+
26
+ along with a confidence score.
27
+
28
+ This project focuses on a **clean ML pipeline and deployment workflow** rather than large pretrained models.
29
+
30
+ ---
31
+
32
+ ## Dataset
33
+
34
+ - Google Civil Comments Toxicity dataset
35
+ - Continuous toxicity scores converted into binary labels
36
+ - Subsampled for efficient training
37
+
38
+ ---
39
+
40
+ ## Model
41
+
42
+ - TF-IDF features (word n-grams)
43
+ - Logistic Regression (scikit-learn)
44
+ - Class-weighted to handle imbalance
45
+ - CPU-only inference
46
+
47
+ ---
48
+
49
+ ## Deployment
50
+
51
+ - Gradio-based user interface
52
+ - Hosted on Hugging Face Spaces
53
+ - Model artifacts loaded at runtime
54
+ - No GPU required
55
+
56
+ ---
57
+
58
+ ## Notes
59
+
60
+ This demo is intended for **educational purposes** and should not be used as a standalone moderation system.
61
+
62
+ ---
63
+
64
+ ## License
65
+
66
+ MIT
README.md DELETED
@@ -1,14 +0,0 @@
1
- ---
2
- title: Toxic Comment Detector
3
- emoji: 💻
4
- colorFrom: indigo
5
- colorTo: purple
6
- sdk: gradio
7
- sdk_version: 6.5.1
8
- app_file: app.py
9
- pinned: false
10
- license: mit
11
- short_description: Toxic comment detection system built using ML techniques
12
- ---
13
-
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app.py ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import re
2
+ import joblib
3
+ import gradio as gr
4
+ from huggingface_hub import hf_hub_download
5
+
6
+ # Download model artifacts from HF Model Hub
7
+ tfidf_path = hf_hub_download(
8
+ repo_id="harishsahadev/toxic-comment-detector-classical",
9
+ filename="tfidf_vectorizer.joblib"
10
+ )
11
+
12
+ model_path = hf_hub_download(
13
+ repo_id="harishsahadev/toxic-comment-detector-classical",
14
+ filename="toxic_classifier.joblib"
15
+ )
16
+
17
+ tfidf = joblib.load(tfidf_path)
18
+ model = joblib.load(model_path)
19
+
20
+
21
+ def clean_text(text):
22
+ text = text.lower()
23
+ text = re.sub(r"http\S+|www\S+", "", text)
24
+ text = re.sub(r"[^a-z\s]", " ", text)
25
+ return re.sub(r"\s+", " ", text).strip()
26
+
27
+
28
+ def predict(text):
29
+ vec = tfidf.transform([clean_text(text)])
30
+ prob = model.predict_proba(vec)[0][1]
31
+ return {
32
+ "label": "Toxic" if prob >= 0.5 else "Non-Toxic",
33
+ "toxicity_probability": round(float(prob), 4),
34
+ }
35
+
36
+
37
+ demo = gr.Interface(
38
+ fn=predict,
39
+ inputs=gr.Textbox(lines=4),
40
+ outputs="json",
41
+ title="Toxic Comment Detection",
42
+ cache_examples=False,
43
+ )
44
+
45
+ if __name__ == "__main__":
46
+ demo.launch(server_name="0.0.0.0", server_port=7860)
requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ gradio==4.44.0
2
+ huggingface_hub==0.20.3
3
+ scikit-learn==1.6.1
4
+ joblib