hardik-0212 commited on
Commit
2cb1f58
Β·
verified Β·
1 Parent(s): 727bfc2

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +125 -0
README.md ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🧠 Machine Learning Model Comparison – Classification Project
2
+
3
+ This project compares a variety of supervised machine learning algorithms to evaluate their performance on structured classification tasks. Each model was analyzed based on speed, accuracy, and practical usability.
4
+
5
+ ---
6
+
7
+ ## πŸ“Œ Models Included
8
+
9
+ | No. | Model Name | Type |
10
+ |-----|------------------------|-----------------------------|
11
+ | 1 | Logistic Regression | Linear Model |
12
+ | 2 | Random Forest | Ensemble (Bagging) |
13
+ | 3 | K-Nearest Neighbors | Instance-Based (Lazy) |
14
+ | 4 | XGBoost | Gradient Boosting |
15
+ | 5 | Support Vector Machine | Margin-based Classifier |
16
+ | 6 | ANN (MLPClassifier) | Neural Network |
17
+ | 7 | LightGBM | Gradient Boosting (Histogram) |
18
+ | 8 | Naive Bayes | Probabilistic |
19
+
20
+ ---
21
+
22
+ ## πŸ“Š Accuracy Summary
23
+
24
+ | Model | Accuracy (%) | Speed |
25
+ |---------------------|--------------|---------------|
26
+ | Logistic Regression | ~84% | πŸ”₯ Very Fast |
27
+ | Random Forest | ~95% | ⚑ Medium |
28
+ | KNN | ~84% | 🐒 Slow |
29
+ | XGBoost | ~90% | ⚑ Medium |
30
+ | SVM | ~85% | ⚑ Medium |
31
+ | ANN (MLP) | ~51% | ⚑ Medium |
32
+ | LightGBM | ~90% | πŸš€ Fastest |
33
+ | Naive Bayes | ~80% | πŸš€ Extremely Fast |
34
+
35
+ ---
36
+
37
+ ## 🧠 Model Descriptions
38
+
39
+ ---
40
+
41
+ ### 1. **Logistic Regression**
42
+ - A linear model that predicts class probabilities using a sigmoid function.
43
+ - βœ… Best for interpretable and quick binary classification.
44
+ - ❌ Not ideal for non-linear or complex patterns.
45
+
46
+ ---
47
+
48
+ ### 2. **Random Forest**
49
+ - An ensemble of decision trees with majority voting.
50
+ - βœ… Excellent accuracy and robustness.
51
+ - ❌ Slower and harder to interpret than simpler models.
52
+
53
+ ---
54
+
55
+ ### 3. **K-Nearest Neighbors (KNN)**
56
+ - A lazy learner that predicts based on the nearest data points.
57
+ - βœ… Simple and training-free.
58
+ - ❌ Very slow for large datasets; sensitive to noise.
59
+
60
+ ---
61
+
62
+ ### 4. **XGBoost**
63
+ - A boosting algorithm that builds trees sequentially to minimize error.
64
+ - βœ… High accuracy, regularization, built-in feature importance.
65
+ - ❌ Slightly complex tuning; slower than simpler models.
66
+
67
+ ---
68
+
69
+ ### 5. **Support Vector Machine (SVM)**
70
+ - Separates classes by finding the maximum margin hyperplane.
71
+ - βœ… Excellent for high-dimensional or non-linear data.
72
+ - ❌ Doesn’t scale well; requires feature scaling.
73
+
74
+ ---
75
+
76
+ ### 6. **ANN (MLPClassifier – sklearn)**
77
+ - A basic feedforward neural network with hidden layers.
78
+ - βœ… Capable of learning complex patterns.
79
+ - ❌ Low accuracy in this project; needs better tuning and data scaling.
80
+
81
+ ---
82
+
83
+ ### 7. **LightGBM**
84
+ - A gradient boosting framework optimized for speed and memory.
85
+ - βœ… Faster than XGBoost, supports categorical features directly.
86
+ - ❌ Can overfit small datasets if not tuned well.
87
+
88
+ ---
89
+
90
+ ### 8. **Naive Bayes (GaussianNB)**
91
+ - A probabilistic classifier assuming feature independence.
92
+ - βœ… Fastest model; works well for text and high-dimensional data.
93
+ - ❌ Feature independence rarely true; weak for complex patterns.
94
+
95
+ ---
96
+
97
+ ## πŸ§ͺ Recommendation Summary
98
+
99
+ | Best For | Model |
100
+ |----------------------|--------------------|
101
+ | Highest Accuracy | Random Forest |
102
+ | Fastest Training | Naive Bayes |
103
+ | Best for Large Data | LightGBM |
104
+ | Best Baseline | Logistic Regression|
105
+ | Best for Clean Data | SVM |
106
+ | Best for Speed + Accuracy | XGBoost |
107
+
108
+ ---
109
+
110
+ ## πŸ“Ž Resources Included
111
+
112
+ - πŸ“ `model.pkl` files for each classifier
113
+ - πŸ“„ `cart.docx` with graphs, charts, and performance analysis
114
+ - 🧾 This `README.md` as the model card
115
+
116
+ For more information check cart.docx file.
117
+ ---
118
+
119
+ ## πŸ”§ How to Use
120
+
121
+ ```python
122
+ from joblib import load
123
+ model = load("XGBoost_model.pkl")
124
+ prediction = model.predict(["Sample input text"])
125
+ print(prediction)