|
|
---
|
|
|
tags:
|
|
|
- spam
|
|
|
- classification
|
|
|
- bert
|
|
|
- pytorch
|
|
|
- comment-filter
|
|
|
- text-classification
|
|
|
- content-moderation
|
|
|
- social-media
|
|
|
license: mit
|
|
|
language: en
|
|
|
datasets: custom
|
|
|
widget:
|
|
|
- text: "Click here to win a free iPhone!"
|
|
|
- text: "Great video, thanks for sharing!"
|
|
|
- text: "Follow me for daily crypto tips π°"
|
|
|
- text: "This tutorial saved my life, thank you!"
|
|
|
- text: "π₯ Get rich quick! Limited-time offer!"
|
|
|
---
|
|
|
|
|
|
# π¦ Spam Detector β `vibehq/spam-detector`
|
|
|
|
|
|
A BERT-based spam classifier fine-tuned to detect **spam and promotional content** in social media-style comments. Trained on real-world-like comment data including giveaways, scams, promotions, and genuine engagement.
|
|
|
|
|
|
Perfect for content moderation on platforms like:
|
|
|
- YouTube
|
|
|
- Instagram
|
|
|
- Discord
|
|
|
- Reddit
|
|
|
- Facebook
|
|
|
- Forums or blogs
|
|
|
|
|
|
---
|
|
|
|
|
|
## π How to Use
|
|
|
|
|
|
```python
|
|
|
from transformers import BertTokenizer, BertForSequenceClassification
|
|
|
import torch
|
|
|
|
|
|
# Load model and tokenizer
|
|
|
model = BertForSequenceClassification.from_pretrained("vibehq/spam-detector")
|
|
|
tokenizer = BertTokenizer.from_pretrained("vibehq/spam-detector")
|
|
|
|
|
|
def predict_spam(comment):
|
|
|
inputs = tokenizer(comment, return_tensors='pt', max_length=128, padding='max_length', truncation=True)
|
|
|
with torch.no_grad():
|
|
|
outputs = model(**inputs)
|
|
|
prediction = torch.argmax(outputs.logits, dim=-1).item()
|
|
|
return "Spam" if prediction == 1 else "Non-Spam"
|
|
|
|
|
|
# Example
|
|
|
print(predict_spam("Subscribe to my channel for more giveaways!")) |