| license: mit | |
| tags: | |
| - spam | |
| - text-classification | |
| - scikit-learn | |
| - tfidf | |
| - spaCy | |
| - logistic-regression | |
| language: en | |
| datasets: custom | |
| model-index: | |
| - name: Spam Classifier (Scikit-learn + spaCy) | |
| results: [] | |
| # 📧 Spam Classifier (Scikit-learn + spaCy) | |
| This model classifies messages as **spam** or **ham** using traditional NLP techniques. | |
| ## 🧠 Model Details | |
| - **Preprocessing**: Tokenization + Lemmatization using spaCy | |
| - **Vectorization**: TF-IDF (1-2 grams) | |
| - **Feature Selection**: Chi2 with top 1000 features | |
| - **Model**: Logistic Regression (`class_weight="balanced"`, `max_iter=1000`) | |
| - **Performance**: ~87% accuracy on balanced test set (800 spam, 800 ham) | |
| ## 📦 Files | |
| - `spam_classifier_bundle.joblib`: Includes trained model, TF-IDF vectorizer, label encoder, and feature selector | |
| ## 📥 Load Model (Example) | |
| ```python | |
| from huggingface_hub import hf_hub_download | |
| import joblib | |
| bundle = joblib.load(hf_hub_download("mageshcruz/spam-classifier-scikit", "spam_classifier.joblib")) | |
| model = bundle["model"] | |
| vector = bundle["vectorizer"] | |
| selector = bundle["selector"] | |
| le = bundle["label_encoder"] | |