mltrev23 commited on
Commit
a0520f6
·
verified ·
1 Parent(s): 3cb7184

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +117 -3
README.md CHANGED
@@ -1,3 +1,117 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Spam Classification Models
2
+
3
+ ## Overview
4
+
5
+ This repository contains two models designed for detecting spam in SMS messages, both trained on the `mltrev23/spam-classify` dataset. The models include:
6
+
7
+ 1. **Spam Classifier**: A machine learning model trained to classify SMS messages as either spam or ham (non-spam).
8
+ 2. **Count Vectorizer**: A vectorization model used to transform SMS text data into numerical feature vectors suitable for classification.
9
+
10
+ ## Models
11
+
12
+ ### 1. Spam Classifier
13
+
14
+ - **Filename**: `spam_classifier.pkl`
15
+ - **Type**: Multinomial Naive Bayes (or other type, based on your actual model)
16
+ - **Input**: Numerical feature vectors (output from the Count Vectorizer)
17
+ - **Output**: Binary classification (`spam` or `ham`)
18
+
19
+ ### 2. Count Vectorizer
20
+
21
+ - **Filename**: `count_vectorizer.pkl`
22
+ - **Type**: Scikit-learn's `CountVectorizer`
23
+ - **Input**: Raw SMS text data
24
+ - **Output**: Sparse matrix of token counts
25
+
26
+ ## Dataset
27
+
28
+ Both models were trained on the `mltrev23/spam-classify` dataset, which consists of SMS messages labeled as either spam or ham. The dataset includes a diverse set of SMS messages that provide a robust training set for detecting unwanted or harmful content.
29
+
30
+ ## Installation
31
+
32
+ To use these models, first clone this repository and install the required Python packages:
33
+
34
+ ```bash
35
+ git clone https://huggingface.co/yourusername/spam-classification
36
+ cd spam-classification
37
+ pip install -r requirements.txt
38
+ ```
39
+
40
+ ### Requirements
41
+
42
+ The models require the following Python libraries:
43
+
44
+ ```bash
45
+ pip install scikit-learn
46
+ pip install numpy
47
+ ```
48
+
49
+ ## Usage
50
+
51
+ ### Loading the Models
52
+
53
+ You can load the models using the `joblib` library:
54
+
55
+ ```python
56
+ import joblib
57
+
58
+ # Load the count vectorizer
59
+ vectorizer = joblib.load('count_vectorizer.pkl')
60
+
61
+ # Load the spam classifier
62
+ classifier = joblib.load('spam_classifier.pkl')
63
+ ```
64
+
65
+ ### Predicting Spam Messages
66
+
67
+ To classify new SMS messages, follow these steps:
68
+
69
+ ```python
70
+ # Sample SMS messages
71
+ messages = ["Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive entry question(std txt rate)",
72
+ "I'll call you later"]
73
+
74
+ # Transform the messages into feature vectors
75
+ X = vectorizer.transform(messages)
76
+
77
+ # Predict using the classifier
78
+ predictions = classifier.predict(X)
79
+
80
+ # Output the predictions
81
+ for message, prediction in zip(messages, predictions):
82
+ print(f"Message: {message} \nPrediction: {'Spam' if prediction == 'spam' else 'Ham'}\n")
83
+ ```
84
+
85
+ ### Evaluating the Classifier
86
+
87
+ You can also evaluate the performance of the classifier using a test set from the same dataset:
88
+
89
+ ```python
90
+ from sklearn.metrics import accuracy_score, classification_report
91
+
92
+ # Assuming you have a test set of messages and labels
93
+ X_test = vectorizer.transform(test_messages)
94
+ y_pred = classifier.predict(X_test)
95
+
96
+ print("Accuracy:", accuracy_score(test_labels, y_pred))
97
+ print(classification_report(test_labels, y_pred))
98
+ ```
99
+
100
+ ## Model Interpretation
101
+
102
+ The spam classifier is a powerful tool for identifying unwanted SMS messages, but understanding why it makes certain decisions is also crucial. You can inspect the model's learned parameters, such as the most influential words for each class (spam or ham), to gain insights into how the model works.
103
+
104
+ ## Contributing
105
+
106
+ If you wish to contribute to this repository by improving the models or expanding the dataset, feel free to submit a pull request. Please ensure that your code is well-documented and adheres to the existing style.
107
+
108
+ ## License
109
+
110
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
111
+
112
+ ## References
113
+
114
+ If you use these models in your research or project, please cite the dataset and relevant model training methods as follows:
115
+
116
+ - **Dataset**: `mltrev23/spam-classify`
117
+ - **Naive Bayes**: McCallum, A., & Nigam, K. (1998). A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization (pp. 41-48).