mageshcruz
/

spam-classifier-scikit

Text Classification

logistic-regression

Model card Files Files and versions

spam-classifier-scikit / README.md

mageshcruz's picture

Upload README.md with huggingface_hub

638ed20 verified 6 months ago

|

history blame contribute delete

1.18 kB

	---
	license: mit
	tags:
	- spam
	- text-classification
	- scikit-learn
	- tfidf
	- spaCy
	- logistic-regression
	language: en
	datasets: custom
	model-index:
	- name: Spam Classifier (Scikit-learn + spaCy)
	results: []
	---

	# 📧 Spam Classifier (Scikit-learn + spaCy)

	This model classifies messages as spam or ham using traditional NLP techniques.

	## 🧠 Model Details

	- Preprocessing: Tokenization + Lemmatization using spaCy
	- Vectorization: TF-IDF (1-2 grams)
	- Feature Selection: Chi2 with top 1000 features
	- Model: Logistic Regression (`class_weight="balanced"`, `max_iter=1000`)
	- Performance: ~87% accuracy on balanced test set (800 spam, 800 ham)

	## 📦 Files

	- `spam_classifier_bundle.joblib`: Includes trained model, TF-IDF vectorizer, label encoder, and feature selector

	## 📥 Load Model (Example)

	```python
	from huggingface_hub import hf_hub_download
	import joblib

	bundle = joblib.load(hf_hub_download("mageshcruz/spam-classifier-scikit", "spam_classifier.joblib"))
	model = bundle["model"]
	vector = bundle["vectorizer"]
	selector = bundle["selector"]
	le = bundle["label_encoder"]