ucirvine/sms_spam
Viewer β’ Updated β’ 5.57k β’ 5.18k β’ 56
How to use anu56787ty/spam-detector-tfidf-lr with Scikit-learn:
from huggingface_hub import hf_hub_download
import joblib
model = joblib.load(
hf_hub_download("anu56787ty/spam-detector-tfidf-lr", "sklearn_model.joblib")
)
# only load pickle files from sources you trust
# read more about it here https://skops.readthedocs.io/en/stable/persistence.htmlA lightweight, free spam email/SMS classifier trained on the SMS Spam Collection dataset.
Zero cost to run β no GPU needed, CPU-only inference.
| Metric | Score |
|---|---|
| Accuracy | 98.65% |
| F1 Score (spam) | 94.88% |
| Precision (spam) | 96.53% |
| Recall (spam) | 93.29% |
Evaluated on 1,115 held-out test messages (20% split).
Confusion Matrix:
Predicted Ham Predicted Spam
Actual Ham 961 5
Actual Spam 10 139
Only 10 false negatives (spam missed) and 5 false positives out of 1,115 test samples.
import pickle
from huggingface_hub import hf_hub_download
# Load model
tfidf_path = hf_hub_download("anu56787ty/spam-detector-tfidf-lr", "tfidf_vectorizer.pkl")
clf_path = hf_hub_download("anu56787ty/spam-detector-tfidf-lr", "logistic_regression.pkl")
with open(tfidf_path, "rb") as f:
tfidf = pickle.load(f)
with open(clf_path, "rb") as f:
clf = pickle.load(f)
def predict(text):
vec = tfidf.transform([text])
pred = clf.predict(vec)[0]
proba = clf.predict_proba(vec)[0]
return {"label": "spam" if pred == 1 else "ham", "confidence": proba[pred]}
# Try it!
print(predict("Congratulations! You won a FREE iPhone! Click now!"))
# β {'label': 'spam', 'confidence': 0.977}
print(predict("Hey, what time are we meeting for lunch?"))
# β {'label': 'ham', 'confidence': 0.986}
Input text
β
TF-IDF Vectorizer
β’ max_features=10,000
β’ ngram_range=(1,2) β unigrams + bigrams
β’ sublinear_tf=True
β
Logistic Regression
β’ C=5.0
β’ class_weight='balanced' β handles class imbalance
β
Output: ham / spam + confidence score
| Feature | Value |
|---|---|
| π° Cost | Free β $0 |
| β‘ Speed | < 1ms per prediction |
| πΎ Size | ~2 MB total |
| π₯οΈ Hardware | CPU only |
| π¦ Dependencies | scikit-learn, huggingface_hub |
Dataset : ucirvine/sms_spam (5,574 messages)
Train : 4,459 messages
Test : 1,115 messages
Time : < 5 seconds on CPU
This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = 'anu56787ty/spam-detector-tfidf-lr'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.