Text Classification
Transformers
Safetensors
English
tiny_transformer
Michielo commited on
Commit
2a597ea
·
verified ·
1 Parent(s): 336908c

Rename README.md to Added usage & limitations

Browse files
README.md → Added usage & limitations RENAMED
@@ -11,7 +11,7 @@ A tiny comment toxicity classifier model at only 2M parameters. With only ~10MB
11
  A paper on this model is being released soon.
12
 
13
 
14
- ### Benchmarks
15
 
16
  The Tiny-Toxic-Detector achieves an impressive 90.26% on the Toxigen benchmark and 87.34% on the Jigsaw-Toxic-Comment-Classification-Challenge. Here we compare our results against other toxic classification models:
17
 
@@ -26,7 +26,7 @@ The Tiny-Toxic-Detector achieves an impressive 90.26% on the Toxigen benchmark a
26
  | **Tiny-toxic-detector** | **2M** | **90.26** | 87.34 |
27
 
28
 
29
- ### Usage
30
  This model uses custom architecture and requires some extra custom code to work. Below you can find the architecture and a fully-usable example.
31
 
32
  <details>
@@ -203,3 +203,25 @@ with torch.no_grad():
203
  logits = outputs["logits"].squeeze()
204
  prediction = "Toxic" if logits > 0.5 else "Not Toxic"
205
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  A paper on this model is being released soon.
12
 
13
 
14
+ ## Benchmarks
15
 
16
  The Tiny-Toxic-Detector achieves an impressive 90.26% on the Toxigen benchmark and 87.34% on the Jigsaw-Toxic-Comment-Classification-Challenge. Here we compare our results against other toxic classification models:
17
 
 
26
  | **Tiny-toxic-detector** | **2M** | **90.26** | 87.34 |
27
 
28
 
29
+ ## Usage
30
  This model uses custom architecture and requires some extra custom code to work. Below you can find the architecture and a fully-usable example.
31
 
32
  <details>
 
203
  logits = outputs["logits"].squeeze()
204
  prediction = "Toxic" if logits > 0.5 else "Not Toxic"
205
  ```
206
+
207
+
208
+ ## Usage and Limitations
209
+
210
+ Toxicity classification models always have certain limitations you should be aware of, and this model is no different.
211
+
212
+ ### Intended Usage
213
+
214
+ The Tiny-toxic-detector is designed to classify comments for toxicity. It is particularly useful in scenarios where minimal resource usage and rapid inference are essential. Key features include:
215
+ * Low Resource Consumption: With a requirement of (roughly) only 10MB of RAM and 8MB of VRAM, this model is well-suited for environments with limited hardware resources.
216
+ * Fast Inference: The model provides high-speed inference. The Tiny-toxic-detector significantly outperforms larger models on CPU-based systems. Due to the overhead of using GPU inference, small models with a relatively small number of input tokens are often faster on CPU. This includes the Tiny-toxic-detector.
217
+
218
+ ### Limitations
219
+
220
+ * Training Data
221
+ * The Tiny-toxic-detector has been trained exclusively on English-language data, limiting its ability to classify toxicity in other languages.
222
+ * Maximum Context Length
223
+ * The model can handle up to 512 input tokens. Comments exceeding this length are not in the scope of this model.
224
+ * While extending the context length is possible, such modifications have not been trained for or validated. Early tests with a 4096-token context resulted in a performance drop of over 10% on the Toxigen benchmark.
225
+ * Language Ambiguity
226
+ * The Tiny-toxic-detector may struggle with ambiguous or nuanced language as any other model would. Even though benchmarks like Toxigen evaluate the model’s performance with ambiguous language, it may still misclassify comments where toxicity is not clearly defined.
227
+