hamza-amin/urdu-spam-dataset
Viewer • Updated • 3k • 17
How to use hamza-amin/urdu-spam-classifier with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="hamza-amin/urdu-spam-classifier") # Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("hamza-amin/urdu-spam-classifier")
model = AutoModelForSequenceClassification.from_pretrained("hamza-amin/urdu-spam-classifier")This model classifies Urdu text into:
It is designed for AI-powered emergency helpline systems (e.g., 1122/911) to filter prank or irrelevant calls in real time.
hamza-amin/urdu-spam-dataset | Metric | Score |
|---|---|
| Accuracy | 0.983 |
| F1 Score | 0.982 |
| Precision | 0.993 |
| Recall | 0.972 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "hamza-amin/urdu-spam-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
def predict(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=1)
return probs.tolist()
print(predict("یہ ایک ایمرجنسی ہے فوراً مدد کریں"))
print(predict("ہیلو بس مذاق کر رہا تھا"))
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall |
|---|---|---|---|---|---|---|---|
| 0.1146 | 1.0 | 169 | 0.1139 | 0.9834 | 0.9826 | 0.9860 | 0.9792 |
| 0.0902 | 2.0 | 338 | 0.0954 | 0.9834 | 0.9825 | 0.9929 | 0.9722 |
| 0.0085 | 3.0 | 507 | 0.0972 | 0.9867 | 0.9859 | 1.0 | 0.9722 |
Trained on synthetic data
May struggle with:
MIT
Base model
FacebookAI/xlm-roberta-base