|
|
--- |
|
|
tags: |
|
|
- spam |
|
|
- classification |
|
|
- bert |
|
|
- pytorch |
|
|
- comment-filter |
|
|
- text-classification |
|
|
- content-moderation |
|
|
- social-media |
|
|
license: mit |
|
|
language: en |
|
|
datasets: custom |
|
|
widget: |
|
|
- text: "Click here to win a free iPhone!" |
|
|
- text: "Great video, thanks for sharing!" |
|
|
- text: "Follow me for daily crypto tips π°" |
|
|
- text: "This tutorial saved my life, thank you!" |
|
|
- text: "π₯ Get rich quick! Limited-time offer!" |
|
|
--- |
|
|
|
|
|
# π¦ Spam Detector β `vibehq/spam-detector` |
|
|
|
|
|
A BERT-based spam classifier fine-tuned to detect **spam and promotional content** in social media-style comments. Trained on real-world-like comment data including giveaways, scams, promotions, and genuine engagement. |
|
|
|
|
|
Perfect for content moderation on platforms like: |
|
|
- YouTube |
|
|
- Instagram |
|
|
- Discord |
|
|
- Reddit |
|
|
- Facebook |
|
|
- Forums or blogs |
|
|
|
|
|
--- |
|
|
|
|
|
## π How to Use |
|
|
|
|
|
```python |
|
|
from transformers import BertTokenizer, BertForSequenceClassification |
|
|
import torch |
|
|
|
|
|
# Load model and tokenizer |
|
|
model = BertForSequenceClassification.from_pretrained("vibehq/spam-detector") |
|
|
tokenizer = BertTokenizer.from_pretrained("vibehq/spam-detector") |
|
|
|
|
|
def predict_spam(comment): |
|
|
inputs = tokenizer(comment, return_tensors='pt', max_length=128, padding='max_length', truncation=True) |
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
prediction = torch.argmax(outputs.logits, dim=-1).item() |
|
|
return "Spam" if prediction == 1 else "Non-Spam" |
|
|
|
|
|
# Example |
|
|
print(predict_spam("Subscribe to my channel for more giveaways!")) |