lokas
/

spam-emails-classifier

Text Classification

spam-classification

binary-classification

Model card Files Files and versions

Metrics Training metrics Community

lokas commited on Jun 22, 2025

Commit

87b08b3

·

verified ·

1 Parent(s): e58cd1e

Update README.md

Files changed (1) hide show

README.md +60 -2

README.md CHANGED Viewed

@@ -1,2 +1,60 @@
-# LSTM Spam Detector
-This is a simple LSTM model to detect spam messages.

+---
+language: en
+license: mit
+tags:
+  - keras
+  - lstm
+  - spam-classification
+  - text-classification
+  - binary-classification
+  - email
+  - deep-learning
+library_name: keras
+pipeline_tag: text-classification
+model_name: Spam Email Classifier (BiLSTM)
+datasets:
+  - SetFit/enron_spam
+---
+# 📧 Spam Email Classifier using BiLSTM
+This model uses a **Bidirectional LSTM (BiLSTM)** architecture built with **Keras** to classify email messages as **Spam** or **Ham**. It was trained on the [Enron Spam Dataset](https://huggingface.co/datasets/SetFit/enron_spam) using GloVe word embeddings.
+---
+## 🧠 Model Architecture
+- **Tokenizer**: Keras `Tokenizer` trained on the Enron dataset
+- **Embedding**: Pretrained [GloVe.6B.100d](https://nlp.stanford.edu/projects/glove/)
+- **Model**: `Embedding → BiLSTM → Dropout → Dense(sigmoid)`
+- **Input**: English email/message text
+- **Output**: `0 = Ham`, `1 = Spam`
+---
+## 🧪 Example Usage
+```python
+from tensorflow.keras.models import load_model
+from huggingface_hub import hf_hub_download
+import pickle
+from tensorflow.keras.preprocessing.sequence import pad_sequences
+# Load files from HF Hub
+model_path = hf_hub_download("lokas/spam-emails-classifier", "model.h5")
+tokenizer_path = hf_hub_download("lokas/spam-emails-classifier", "tokenizer.pkl")
+# Load model and tokenizer
+model = load_model(model_path)
+with open(tokenizer_path, "rb") as f:
+    tokenizer = pickle.load(f)
+# Prediction function
+def predict_spam(text):
+    seq = tokenizer.texts_to_sequences([text])
+    padded = pad_sequences(seq, maxlen=50)  # must match training maxlen
+    pred = model.predict(padded)[0][0]
+    return "🚫 Spam" if pred > 0.5 else "✅ Not Spam"
+# Example
+print(predict_spam("Win a free iPhone now!"))