mageshcruz's picture
Upload README.md with huggingface_hub
638ed20 verified
---
license: mit
tags:
- spam
- text-classification
- scikit-learn
- tfidf
- spaCy
- logistic-regression
language: en
datasets: custom
model-index:
- name: Spam Classifier (Scikit-learn + spaCy)
results: []
---
# 📧 Spam Classifier (Scikit-learn + spaCy)
This model classifies messages as **spam** or **ham** using traditional NLP techniques.
## 🧠 Model Details
- **Preprocessing**: Tokenization + Lemmatization using spaCy
- **Vectorization**: TF-IDF (1-2 grams)
- **Feature Selection**: Chi2 with top 1000 features
- **Model**: Logistic Regression (`class_weight="balanced"`, `max_iter=1000`)
- **Performance**: ~87% accuracy on balanced test set (800 spam, 800 ham)
## 📦 Files
- `spam_classifier_bundle.joblib`: Includes trained model, TF-IDF vectorizer, label encoder, and feature selector
## 📥 Load Model (Example)
```python
from huggingface_hub import hf_hub_download
import joblib
bundle = joblib.load(hf_hub_download("mageshcruz/spam-classifier-scikit", "spam_classifier.joblib"))
model = bundle["model"]
vector = bundle["vectorizer"]
selector = bundle["selector"]
le = bundle["label_encoder"]