JohanBeytell commited on
Commit
c110d02
·
verified ·
1 Parent(s): c1112f1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +82 -3
README.md CHANGED
@@ -1,3 +1,82 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ metrics:
6
+ - precision
7
+ - recall
8
+ - f1
9
+ - accuracy
10
+ pipeline_tag: text-classification
11
+ tags:
12
+ - classification
13
+ - security
14
+ ---
15
+
16
+ # Model Card for Infinitode/SMCM-OPEN-ARC
17
+
18
+ Repository: https://github.com/Infinitode/OPEN-ARC/
19
+
20
+ ## Model Description
21
+
22
+ OPEN-ARC-SMC is a MultinomialNB model developed as part of Infinitode's OPEN-ARC initiative. It was created to categorize text, particularly emails, as either spam or legitimate (ham).
23
+
24
+ **Architecture**:
25
+
26
+ - **MultinomialNB**: Used default parameters.
27
+ - **Framework**: SKLearn.
28
+ - **Training Setup**: Trained using default params.
29
+
30
+ ## Uses
31
+
32
+ - Determining whether emails or SMS are spam or legitimate.
33
+ - Enhancing research and developing defensive measures against spammers.
34
+
35
+ ## Limitations
36
+
37
+ Emails or SMS may be classified as false positives or false negatives because of the nature of the data and its inherent limitations.
38
+
39
+ ## Training Data
40
+
41
+ - Dataset: Spam Mail Classifier Dataset dataset from Kaggle.
42
+ - Source URL: https://www.kaggle.com/datasets/mosapabdelghany/spam-mail-classifier/
43
+ - Content: Messages categorized as either spam or ham (legitimate emails or SMS).
44
+ - Size: 1000 email/SMS messages labeled as spam or ham.
45
+ - Preprocessing: The preprocessing steps included removing missing values and converting text into vectors.
46
+
47
+ ## Training Procedure
48
+
49
+ - Metrics: accuracy, precision, recall, F1
50
+ - Train/Testing Split: 80% train, 20% testing.
51
+
52
+ ## Evaluation Results
53
+
54
+ | Metric | Value |
55
+ | ------ | ----- |
56
+ | Testing Accuracy | 98.48% |
57
+ | Testing Precision (`spam`) | 96.15% |
58
+ | Testing Recall (`spam`) | 93.17% |
59
+ | Testing F1 (`spam`) | 94.64% |
60
+
61
+ ## How to Use
62
+
63
+ ```python
64
+ new_emails = [
65
+ "Congratulations! You've won a free prize. Click the link to claim.", # Likely spam
66
+ "Hi, just confirming our meeting for tomorrow at 10 AM. Thanks." # Likely not spam
67
+ ]
68
+
69
+ # Vectorize the new emails using the fitted vectorizer
70
+ new_emails_vectorized = vectorizer.transform(new_emails)
71
+
72
+ # Make predictions
73
+ predictions = model.predict(new_emails_vectorized)
74
+
75
+ for i, email in enumerate(new_emails):
76
+ print(f"\nEmail: '{email}'")
77
+ print(f"Prediction: {predictions[i]}")
78
+ ```
79
+
80
+ ## Contact
81
+
82
+ For questions or issues, open a GitHub issue or reach out at https://infinitode.netlify.app/forms/contact.