fc63 commited on
Commit
1e1a0de
·
verified ·
1 Parent(s): 5fb930b

Update README.md

Browse files

Update Model Card

Files changed (1) hide show
  1. README.md +63 -56
README.md CHANGED
@@ -1,57 +1,64 @@
1
- ---
2
- {}
3
- ---
4
-
5
- Class 0: Insult
6
- Class 1: Other
7
- Class 2: PROFANITY
8
- Class 3: Racist
9
- Class 4: Sexist
10
-
11
- # Model Card for Toxicity Detection Model
12
-
13
- This model is a fine-tuned version of bert-base-uncased for toxicity detection in Turkish text. It has been trained on labeled datasets containing online comments categorized by their toxicity levels. The model uses the Hugging Face transformers library and is suitable for sequence classification tasks. This work was completed as a project assignment for the Natural Language Processing (CENG493) course at Çankaya University.
14
-
15
- - **Model Type:** Sequence Classification
16
- - **Language(s):** Turkish
17
- - **License:** GNU GENERAL PUBLIC LICENSE
18
- - **Fine-tuned from:** `dbmdz/bert-base-turkish-cased`
19
-
20
- ## Uses
21
-
22
- This model can be used directly to analyze the toxicity of text in English. For example:
23
-
24
- - Content moderation in online forums and social media platforms
25
- - Filtering harmful language in customer reviews or feedback
26
- - Monitoring and preventing cyberbullying in messaging applications
27
-
28
- ### Downstream Use
29
-
30
- - Integrating toxic language filtering into chatbots or virtual assistants
31
- - Using it as part of a sentiment analysis pipeline
32
-
33
-
34
- ### Out-of-Scope Use
35
-
36
- - Not suitable for analyzing languages other than Turkish
37
- - Should not be used for sensitive decision-making without human oversight
38
-
39
-
40
- ## Bias, Risks, and Limitations
41
-
42
- The model may inherit biases from the training data, including overrepresentation or underrepresentation of certain demographics or topics. It may also misclassify non-toxic content as toxic or fail to detect subtler forms of toxicity.
43
-
44
- ### Recommendations
45
-
46
- Users should:
47
-
48
- - Avoid deploying the model in high-stakes scenarios without additional validation.
49
- - Regularly monitor performance and update the model if new biases are detected.
50
-
51
- ### Training Data
52
-
53
- https://huggingface.co/datasets/Overfit-GM/turkish-toxic-language
54
-
55
- ## Evaluation
56
-
 
 
 
 
 
 
 
57
  The model was evaluated on a held-out test set containing a balanced mix of toxic and non-toxic examples.
 
1
+ ---
2
+ license: gpl-3.0
3
+ datasets:
4
+ - Overfit-GM/turkish-toxic-language
5
+ language:
6
+ - tr
7
+ base_model:
8
+ - dbmdz/bert-base-turkish-cased
9
+ pipeline_tag: text-classification
10
+ ---
11
+
12
+ Class 0: Insult
13
+ Class 1: Other
14
+ Class 2: PROFANITY
15
+ Class 3: Racist
16
+ Class 4: Sexist
17
+
18
+ # Model Card for Toxicity Detection Model
19
+
20
+ This model is a fine-tuned version of bert-base-uncased for toxicity detection in Turkish text. It has been trained on labeled datasets containing online comments categorized by their toxicity levels. The model uses the Hugging Face transformers library and is suitable for sequence classification tasks. This work was completed as a project assignment for the Natural Language Processing (CENG493) course at Çankaya University.
21
+
22
+ - **Model Type:** Sequence Classification
23
+ - **Language(s):** Turkish
24
+ - **License:** GNU GENERAL PUBLIC LICENSE
25
+ - **Fine-tuned from:** `dbmdz/bert-base-turkish-cased`
26
+
27
+ ## Uses
28
+
29
+ This model can be used directly to analyze the toxicity of text in English. For example:
30
+
31
+ - Content moderation in online forums and social media platforms
32
+ - Filtering harmful language in customer reviews or feedback
33
+ - Monitoring and preventing cyberbullying in messaging applications
34
+
35
+ ### Downstream Use
36
+
37
+ - Integrating toxic language filtering into chatbots or virtual assistants
38
+ - Using it as part of a sentiment analysis pipeline
39
+
40
+
41
+ ### Out-of-Scope Use
42
+
43
+ - Not suitable for analyzing languages other than Turkish
44
+ - Should not be used for sensitive decision-making without human oversight
45
+
46
+
47
+ ## Bias, Risks, and Limitations
48
+
49
+ The model may inherit biases from the training data, including overrepresentation or underrepresentation of certain demographics or topics. It may also misclassify non-toxic content as toxic or fail to detect subtler forms of toxicity.
50
+
51
+ ### Recommendations
52
+
53
+ Users should:
54
+
55
+ - Avoid deploying the model in high-stakes scenarios without additional validation.
56
+ - Regularly monitor performance and update the model if new biases are detected.
57
+
58
+ ### Training Data
59
+
60
+ https://huggingface.co/datasets/Overfit-GM/turkish-toxic-language
61
+
62
+ ## Evaluation
63
+
64
  The model was evaluated on a held-out test set containing a balanced mix of toxic and non-toxic examples.