Meeex2 commited on
Commit
1129b0f
·
verified ·
1 Parent(s): 2a8f82c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +181 -3
README.md CHANGED
@@ -1,3 +1,181 @@
1
- ---
2
- license: cc-by-nc-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: distilbert-base-uncased
4
+ model_name: Diffguard
5
+ paper_link: https://arxiv.org/pdf/2412.00064
6
+ pipeline_tag: text-classification
7
+ tags:
8
+ - Transformers
9
+ - PyTorch
10
+ - safety
11
+ - inappropriate
12
+ - distilbert
13
+ - DiffGuard
14
+ language:
15
+ - en
16
+ datasets:
17
+ - eliasalbouzidi/NSFW-Safe-Dataset
18
+ metrics:
19
+ - f1
20
+ - accuracy
21
+ - precision
22
+ - recall
23
+ widget:
24
+ - text: A family hiking in the mountains
25
+ example_title: Safe
26
+ - text: A child playing with a puppy
27
+ example_title: Safe
28
+ - text: A couple kissing passionately in bed
29
+ example_title: Nsfw
30
+ - text: A woman naked
31
+ example_title: Nsfw
32
+ - text: A man killing people
33
+ example_title: Nsfw
34
+ - text: A mass shooting
35
+ example_title: Nsfw
36
+ model-index:
37
+ - name: Diffguard
38
+ results:
39
+ - task:
40
+ name: Text Classification
41
+ type: text-classification
42
+ dataset:
43
+ name: NSFW-Safe-Dataset
44
+ type: eliasalbouzidi/NSFW-Safe-Dataset
45
+ metrics:
46
+ - name: F1
47
+ type: f1
48
+ value: 0.974
49
+ - name: Accuracy
50
+ type: accuracy
51
+ value: 0.98
52
+ ---
53
+
54
+ # Model Card
55
+
56
+ <!-- Provide a quick summary of what the model is/does. -->
57
+
58
+ This model is designed to categorize text into two classes: "safe", or "nsfw" (not safe for work), which makes it suitable for content moderation and filtering applications.
59
+
60
+ The model was trained using a dataset containing 190,000 labeled text samples, distributed among the two classes of "safe" and "nsfw".
61
+
62
+ The model is based on the Distilbert-base model.
63
+
64
+ In terms of performance, the model has achieved a score of 0.974 for F1 (40K exemples).
65
+
66
+ To improve the performance of the model, it is necessary to preprocess the input text. You can refer to the preprocess function in the app.py file in the following space: <https://huggingface.co/spaces/eliasalbouzidi/distilbert-nsfw-text-classifier>.
67
+ ### Model Description
68
+
69
+ The model can be used directly to classify text into one of the two classes. It takes in a string of text as input and outputs a probability distribution over the two classes. The class with the highest probability is selected as the predicted class.
70
+
71
+
72
+ - **Developed by:** Elias Al Bouzidi, Massine El Khader, Abdellah Oumida, Mohammed Sbaihi, Eliott Binard
73
+ - **Model type:** 60M
74
+ - **Language (NLP):** English
75
+ - **License:** apache-2.0
76
+
77
+ ### Uses
78
+
79
+ The model can be integrated into larger systems for content moderation or filtering.
80
+ ### Training Data
81
+ The training data for finetuning the text classification model consists of a large corpus of text labeled with one of the two classes: "safe" and "nsfw". The dataset contains a total of 190,000 examples, which are distributed as follows:
82
+
83
+ 117,000 examples labeled as "safe"
84
+
85
+ 63,000 examples labeled as "nsfw"
86
+
87
+ It was assembled by scraping data from the web and utilizing existing open-source datasets. A significant portion of the dataset consists of descriptions for images and scenes. The primary objective was to prevent diffusers from generating NSFW content but it can be used for other moderation purposes.
88
+
89
+ You can access the dataset : https://huggingface.co/datasets/eliasalbouzidi/NSFW-Safe-Dataset
90
+ ### Training hyperparameters
91
+
92
+ The following hyperparameters were used during training:
93
+ - learning_rate: 1e-05
94
+ - train_batch_size: 32
95
+ - eval_batch_size: 32
96
+ - seed: 42
97
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
98
+ - lr_scheduler_type: linear
99
+ - lr_scheduler_warmup_steps: 600
100
+ - num_epochs: 3
101
+ - mixed_precision_training: Native AMP
102
+
103
+ ### Training results
104
+
105
+ | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Fbeta 1.6 | False positive rate | False negative rate | Precision | Recall |
106
+ |:-------------:|:------:|:-----:|:---------------:|:--------:|:------:|:---------:|:-------------------:|:-------------------:|:---------:|:------:|
107
+ | 0.3367 | 0.0998 | 586 | 0.1227 | 0.9586 | 0.9448 | 0.9447 | 0.0331 | 0.0554 | 0.9450 | 0.9446 |
108
+ | 0.0998 | 0.1997 | 1172 | 0.0919 | 0.9705 | 0.9606 | 0.9595 | 0.0221 | 0.0419 | 0.9631 | 0.9581 |
109
+ | 0.0896 | 0.2995 | 1758 | 0.0900 | 0.9730 | 0.9638 | 0.9600 | 0.0163 | 0.0448 | 0.9724 | 0.9552 |
110
+ | 0.087 | 0.3994 | 2344 | 0.0820 | 0.9743 | 0.9657 | 0.9646 | 0.0191 | 0.0367 | 0.9681 | 0.9633 |
111
+ | 0.0806 | 0.4992 | 2930 | 0.0717 | 0.9752 | 0.9672 | 0.9713 | 0.0256 | 0.0235 | 0.9582 | 0.9765 |
112
+ | 0.0741 | 0.5991 | 3516 | 0.0741 | 0.9753 | 0.9674 | 0.9712 | 0.0251 | 0.0240 | 0.9589 | 0.9760 |
113
+ | 0.0747 | 0.6989 | 4102 | 0.0689 | 0.9773 | 0.9697 | 0.9696 | 0.0181 | 0.0305 | 0.9699 | 0.9695 |
114
+ | 0.0707 | 0.7988 | 4688 | 0.0738 | 0.9781 | 0.9706 | 0.9678 | 0.0137 | 0.0356 | 0.9769 | 0.9644 |
115
+ | 0.0644 | 0.8986 | 5274 | 0.0682 | 0.9796 | 0.9728 | 0.9708 | 0.0135 | 0.0317 | 0.9773 | 0.9683 |
116
+ | 0.0688 | 0.9985 | 5860 | 0.0658 | 0.9798 | 0.9730 | 0.9718 | 0.0144 | 0.0298 | 0.9758 | 0.9702 |
117
+ | 0.0462 | 1.0983 | 6446 | 0.0682 | 0.9800 | 0.9733 | 0.9723 | 0.0146 | 0.0290 | 0.9756 | 0.9710 |
118
+ | 0.0498 | 1.1982 | 7032 | 0.0706 | 0.9800 | 0.9733 | 0.9717 | 0.0138 | 0.0303 | 0.9768 | 0.9697 |
119
+ | 0.0484 | 1.2980 | 7618 | 0.0773 | 0.9797 | 0.9728 | 0.9696 | 0.0117 | 0.0345 | 0.9802 | 0.9655 |
120
+ | 0.0483 | 1.3979 | 8204 | 0.0676 | 0.9800 | 0.9734 | 0.9742 | 0.0172 | 0.0248 | 0.9715 | 0.9752 |
121
+ | 0.0481 | 1.4977 | 8790 | 0.0678 | 0.9798 | 0.9731 | 0.9737 | 0.0170 | 0.0255 | 0.9717 | 0.9745 |
122
+ | 0.0474 | 1.5975 | 9376 | 0.0665 | 0.9782 | 0.9713 | 0.9755 | 0.0234 | 0.0191 | 0.9618 | 0.9809 |
123
+ | 0.0432 | 1.6974 | 9962 | 0.0691 | 0.9787 | 0.9718 | 0.9748 | 0.0213 | 0.0213 | 0.9651 | 0.9787 |
124
+ | 0.0439 | 1.7972 | 10548 | 0.0683 | 0.9811 | 0.9748 | 0.9747 | 0.0150 | 0.0254 | 0.9750 | 0.9746 |
125
+ | 0.0442 | 1.8971 | 11134 | 0.0710 | 0.9809 | 0.9744 | 0.9719 | 0.0118 | 0.0313 | 0.9802 | 0.9687 |
126
+ | 0.0425 | 1.9969 | 11720 | 0.0671 | 0.9810 | 0.9747 | 0.9756 | 0.0165 | 0.0232 | 0.9726 | 0.9768 |
127
+ | 0.0299 | 2.0968 | 12306 | 0.0723 | 0.9802 | 0.9738 | 0.9758 | 0.0187 | 0.0217 | 0.9692 | 0.9783 |
128
+ | 0.0312 | 2.1966 | 12892 | 0.0790 | 0.9804 | 0.9738 | 0.9731 | 0.0146 | 0.0279 | 0.9755 | 0.9721 |
129
+ | 0.0266 | 2.2965 | 13478 | 0.0840 | 0.9815 | 0.9752 | 0.9728 | 0.0115 | 0.0302 | 0.9806 | 0.9698 |
130
+ | 0.0277 | 2.3963 | 14064 | 0.0742 | 0.9808 | 0.9746 | 0.9770 | 0.0188 | 0.0199 | 0.9690 | 0.9801 |
131
+ | 0.0294 | 2.4962 | 14650 | 0.0764 | 0.9809 | 0.9747 | 0.9765 | 0.0179 | 0.0211 | 0.9705 | 0.9789 |
132
+ | 0.0304 | 2.5960 | 15236 | 0.0795 | 0.9811 | 0.9748 | 0.9742 | 0.0142 | 0.0266 | 0.9763 | 0.9734 |
133
+ | 0.0287 | 2.6959 | 15822 | 0.0783 | 0.9814 | 0.9751 | 0.9741 | 0.0134 | 0.0272 | 0.9775 | 0.9728 |
134
+ | 0.0267 | 2.7957 | 16408 | 0.0805 | 0.9814 | 0.9751 | 0.9740 | 0.0133 | 0.0274 | 0.9777 | 0.9726 |
135
+ | 0.0318 | 2.8956 | 16994 | 0.0767 | 0.9814 | 0.9752 | 0.9756 | 0.0154 | 0.0240 | 0.9744 | 0.9760 |
136
+ | 0.0305 | 2.9954 | 17580 | 0.0779 | 0.9815 | 0.9753 | 0.9751 | 0.0146 | 0.0251 | 0.9757 | 0.9749 |
137
+
138
+ We selected the checkpoint with the highest F-beta1.6 score.
139
+
140
+ ### Framework versions
141
+
142
+ - Transformers 4.40.1
143
+ - Pytorch 2.3.0+cu121
144
+ - Datasets 2.19.0
145
+ - Tokenizers 0.19.1
146
+
147
+
148
+ ### Out-of-Scope Use
149
+
150
+ It should not be used for any illegal activities.
151
+
152
+ ## Bias, Risks, and Limitations
153
+
154
+ The model may exhibit biases based on the training data used. It may not perform well on text that is written in languages other than English. It may also struggle with sarcasm, irony, or other forms of figurative language. The model may produce false positives or false negatives, which could lead to incorrect categorization of text.
155
+
156
+ ### Recommendations
157
+
158
+
159
+ Users should be aware of the limitations and biases of the model and use it accordingly. They should also be prepared to handle false positives and false negatives. It is recommended to fine-tune the model for specific downstream tasks and to evaluate its performance on relevant datasets.
160
+
161
+
162
+
163
+ ### Load model directly
164
+ ```python
165
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
166
+
167
+ tokenizer = AutoTokenizer.from_pretrained("eliasalbouzidi/distilbert-nsfw-text-classifier")
168
+
169
+ model = AutoModelForSequenceClassification.from_pretrained("eliasalbouzidi/distilbert-nsfw-text-classifier")
170
+
171
+ ```
172
+ ### Use a pipeline
173
+ ```python
174
+ from transformers import pipeline
175
+
176
+ pipe = pipeline("text-classification", model="eliasalbouzidi/distilbert-nsfw-text-classifier")
177
+ ```
178
+
179
+
180
+ ## Contact
181
+ Please reach out to eliasalbouzidi@gmail.com if you have any questions or feedback.