Samanehmoghaddam commited on
Commit
0ced76a
·
verified ·
1 Parent(s): 6ca44cd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -20
README.md CHANGED
@@ -17,9 +17,9 @@ license: mit
17
 
18
  **AbuseBERT** is a **BERT-based classification model** fine-tuned for **abusive language detection**, optimized for **cross-dataset generalization**.
19
 
20
- > Abusive language detection models often suffer from poor generalization due to **sampling and lexical biases** in individual datasets. Our approach addresses this by integrating **ten publicly available abusive language datasets**, harmonizing labels and preprocessing textual samples to create a **broader and more representative training distribution**.
21
 
22
- **Key Findings:**
23
  - Individual dataset models: average F1 = **0.60**
24
  - Integrated model: F1 = **0.84**
25
  - Dataset contribution to performance improvements correlates with **lexical diversity (0.71 correlation)**
@@ -46,9 +46,13 @@ Samaneh Hosseini Moghaddam, Kelly Lyons, Frank Rudzicz, Cheryl Regehr, Vivek Goe
46
  ## Intended Use
47
 
48
  **Recommended:**
49
- - Detecting abusive language in text from social media or online platforms
50
- - Research on bias mitigation and cross-dataset generalization
51
- - Supporting safe and inclusive online environments
 
 
 
 
52
 
53
  **Not Recommended:**
54
  - Fully automated moderation without human oversight
@@ -59,18 +63,27 @@ Samaneh Hosseini Moghaddam, Kelly Lyons, Frank Rudzicz, Cheryl Regehr, Vivek Goe
59
  ## Usage Example
60
 
61
  ```python
62
- from transformers import AutoTokenizer, AutoModelForSequenceClassification
63
- import torch
64
-
65
- # Load model and tokenizer
66
- tokenizer = AutoTokenizer.from_pretrained("Samanehmoghaddam/AbuseBERT")
67
- model = AutoModelForSequenceClassification.from_pretrained("Samanehmoghaddam/AbuseBERT")
68
-
69
- # Sample input
70
- text = "Your example text here."
71
- inputs = tokenizer(text, return_tensors="pt")
72
- outputs = model(**inputs)
73
-
74
- # Predicted label
75
- predicted_label = torch.argmax(outputs.logits, dim=1).item()
76
- print(f"Predicted label: {predicted_label}")
 
 
 
 
 
 
 
 
 
 
17
 
18
  **AbuseBERT** is a **BERT-based classification model** fine-tuned for **abusive language detection**, optimized for **cross-dataset generalization**.
19
 
20
+ > Abusive language detection models often suffer from poor generalization due to **sampling and lexical biases** in individual datasets. Our approach addresses this by integrating **publicly available abusive language datasets**, harmonizing labels and preprocessing textual samples to create a **broader and more representative training distribution**.
21
 
22
+ **Key Findings using 10 datasets:**
23
  - Individual dataset models: average F1 = **0.60**
24
  - Integrated model: F1 = **0.84**
25
  - Dataset contribution to performance improvements correlates with **lexical diversity (0.71 correlation)**
 
46
  ## Intended Use
47
 
48
  **Recommended:**
49
+ - Detecting abusive, offensive, or toxic language in text from social media, online forums, or messaging platforms.
50
+
51
+ - Supporting research on online harassment, cyber violence, and hate speech analysis.
52
+
53
+ - Assisting human moderators in content review or flagging potentially harmful content.
54
+
55
+ - Evaluating trends, prevalence, or patterns of abusive language in large-scale textual datasets.
56
 
57
  **Not Recommended:**
58
  - Fully automated moderation without human oversight
 
63
  ## Usage Example
64
 
65
  ```python
66
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
67
+
68
+ # Load the model
69
+ model_name = "Samanehmoghaddam/AbuseBERT"
70
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
71
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
72
+
73
+ # Create a pipeline for text classification
74
+ classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
75
+
76
+ # Example texts to classify
77
+ texts = [
78
+ "@user You are amazing!",
79
+ "@user You are stupid!",
80
+ ]
81
+
82
+ # Run the classifier
83
+ results = classifier(texts)
84
+
85
+ # Print results
86
+ for text, result in zip(texts, results):
87
+ print(f"Text: {text}")
88
+ print(f"Prediction: {result}")
89
+ print("-" * 40)