nahiar commited on
Commit
35d1a1d
·
verified ·
1 Parent(s): 76eb959

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -1 +1,10 @@
1
  *.pkl filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
1
  *.pkl filter=lfs diff=lfs merge=lfs -text
2
+ images/01_class_distribution.png filter=lfs diff=lfs merge=lfs -text
3
+ images/02_feature_correlation.png filter=lfs diff=lfs merge=lfs -text
4
+ images/03_correlation_matrix.png filter=lfs diff=lfs merge=lfs -text
5
+ images/05_baseline_roc_curve.png filter=lfs diff=lfs merge=lfs -text
6
+ images/07_baseline_feature_importance.png filter=lfs diff=lfs merge=lfs -text
7
+ images/08_cross_validation.png filter=lfs diff=lfs merge=lfs -text
8
+ images/10_tuned_roc_curve.png filter=lfs diff=lfs merge=lfs -text
9
+ images/12_tuned_feature_importance.png filter=lfs diff=lfs merge=lfs -text
10
+ images/13_model_comparison.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,255 +1,243 @@
1
- ---
2
- language: en
3
- license: mit
4
- tags:
5
- - bot-detection
6
- - instagram
7
- - random-forest
8
- - sklearn
9
- - social-media
10
- - classification
11
- metrics:
12
- - accuracy
13
- - precision
14
- - recall
15
- - f1
16
- - roc-auc
17
- library_name: scikit-learn
18
- ---
19
-
20
- # Instagram Bot Detection Model
21
-
22
- ## Model Description
23
-
24
- This Random Forest classifier is designed to detect bot accounts on Instagram based on profile features and behavioral patterns. The model analyzes various account characteristics to determine whether an account is likely automated (bot) or genuine (human).
25
-
26
- ## Model Details
27
 
28
- - **Model Type**: Random Forest Classifier
29
- - **Framework**: scikit-learn
30
- - **Task**: Binary Classification (Bot vs Human)
31
- - **Language**: Python
32
- - **License**: MIT
33
 
34
- ## Performance Metrics
 
 
 
35
 
36
- The model achieves exceptional performance on the test dataset:
37
 
38
- - **ROC-AUC Score**: 0.9988
39
- - **Accuracy**: Near-perfect accuracy in distinguishing bots from legitimate accounts
40
 
41
- The ROC curve demonstrates outstanding discriminative ability with an AUC of 0.9988, indicating near-perfect model performance.
 
 
 
 
 
 
 
 
42
 
43
- ## Features Used
 
 
 
44
 
45
- The model uses the following 12 features for prediction:
46
 
47
- 1. **IsPrivate** - Whether the account is set to private
48
- 2. **IsVerified** - Whether the account has a verification badge
49
- 3. **HasProfilePic** - Whether the account has a profile picture
50
- 4. **FollowingCount** - Number of accounts being followed
51
- 5. **FollowerCount** - Number of followers
52
- 6. **HasExternalUrl** - Whether there's an external URL in the profile
53
- 7. **HasBio** - Whether the account has a bio description
54
- 8. **HasPosts** - Whether the account has made any posts
55
- 9. **PostsCount** - Total number of posts
56
- 10. **FollowToFollowerRatio** - Ratio of following to followers
57
- 11. **IsBusinessAccount** - Whether the account is a business account
58
- 12. **HasHighlights** - Whether the account has story highlights
59
 
60
- ## Intended Use
 
 
 
 
 
 
 
61
 
62
- ### Primary Uses
63
 
64
- - Identifying potential bot accounts on Instagram
65
- - Content moderation and platform integrity
66
- - Research on social media bot behavior
67
- - Automated account screening
68
- - Spam detection systems
69
 
70
- ### Out-of-Scope Uses
 
 
 
 
 
 
71
 
72
- - This model is specifically trained for Instagram and should not be used for other platforms without retraining
73
- - Should not be the sole basis for account suspension decisions
74
- - Not designed for real-time detection without proper infrastructure
75
- - Not suitable for detecting sophisticated bot networks without additional features
76
 
77
- ## How to Use
 
 
78
 
79
- ### Installation
80
 
81
- ```bash
82
- pip install scikit-learn pandas numpy joblib
83
- ```
84
 
85
- ### Loading the Model
 
 
 
 
 
 
 
 
 
86
 
87
- ```python
88
- import joblib
89
- import pandas as pd
90
- import numpy as np
91
- from sklearn.preprocessing import MinMaxScaler
92
-
93
- # Load the model
94
- model = joblib.load('IG_BOT_Detection_Model_v1.pkl')
95
-
96
- # Prepare your data
97
- features = ['IsPrivate', 'IsVerified', 'HasProfilePic', 'FollowingCount',
98
- 'FollowerCount', 'HasExternalUrl', 'HasBio', 'HasPosts',
99
- 'PostsCount', 'FollowToFollowerRatio', 'IsBusinessAccount',
100
- 'HasHighlights']
101
-
102
- # Example account data
103
- account_data = {
104
- 'IsPrivate': 0,
105
- 'IsVerified': 0,
106
- 'HasProfilePic': 1,
107
- 'FollowingCount': 7500,
108
- 'FollowerCount': 150,
109
- 'HasExternalUrl': 1,
110
- 'HasBio': 0,
111
- 'HasPosts': 1,
112
- 'PostsCount': 20,
113
- 'FollowToFollowerRatio': 50.0,
114
- 'IsBusinessAccount': 0,
115
- 'HasHighlights': 0
116
- }
117
 
118
- # Create DataFrame
119
- df = pd.DataFrame([account_data])
120
 
121
- # Scale features (use the same scaler as training)
122
- scaler = MinMaxScaler()
123
- # Note: In production, you should save and load the scaler from training
124
- df_scaled = scaler.fit_transform(df[features])
 
125
 
126
- # Make prediction
127
- prediction = model.predict(df_scaled)
128
- probability = model.predict_proba(df_scaled)
129
 
130
- print(f"Prediction: {'Bot' if prediction[0] == 1 else 'Human'}")
131
- print(f"Confidence - Human: {probability[0][0]:.2%}, Bot: {probability[0][1]:.2%}")
132
- ```
133
 
134
- ### Batch Prediction
 
 
 
 
 
 
135
 
136
- ```python
137
- # For multiple accounts
138
- accounts_df = pd.read_csv('instagram_accounts_to_check.csv')
139
- accounts_scaled = scaler.transform(accounts_df[features])
 
 
 
140
 
141
- predictions = model.predict(accounts_scaled)
142
- probabilities = model.predict_proba(accounts_scaled)
143
 
144
- # Add results to DataFrame
145
- accounts_df['is_bot'] = predictions
146
- accounts_df['bot_probability'] = probabilities[:, 1]
147
 
148
- # Filter likely bots (e.g., probability > 0.8)
149
- likely_bots = accounts_df[accounts_df['bot_probability'] > 0.8]
150
- ```
151
 
152
- ## Training Data
 
 
 
 
 
153
 
154
- The model was trained on a curated dataset of Instagram accounts with labeled bot/human classifications. The dataset includes:
155
 
156
- - Balanced distribution of bot and human accounts
157
- - Various account types and behavioral patterns
158
- - Features extracted from public profile information
159
- - Diverse account ages and activity levels
160
 
161
- **Note**: The training data is proprietary and not included in this repository.
162
 
163
- ## Training Procedure
 
 
 
 
 
 
 
 
 
 
 
 
164
 
165
- ### Preprocessing
166
 
167
- 1. Feature extraction from Instagram account profiles
168
- 2. Calculation of derived features (e.g., FollowToFollowerRatio)
169
- 3. MinMax normalization of all features to [0, 1] range
170
- 4. Train-test split with stratification to maintain class balance
171
 
172
- ### Hyperparameters
 
 
 
173
 
174
- - **Algorithm**: Random Forest Classifier
175
- - **Normalization**: MinMaxScaler
176
- - **Cross-validation**: Stratified K-Fold
177
- - **Feature Selection**: Based on domain knowledge and feature importance
 
 
 
 
 
 
 
 
 
 
 
 
 
178
 
179
- The model was trained using scikit-learn's RandomForestClassifier with optimized hyperparameters selected through cross-validation.
 
180
 
181
- ## Limitations and Bias
 
182
 
183
- ### Limitations
 
 
184
 
185
- - Model performance depends on the quality and accuracy of input features
186
- - May not generalize to new bot patterns not seen during training
187
- - Requires accurate feature extraction from Instagram profiles
188
- - Performance may degrade over time as bot behaviors evolve
189
- - Limited to profile-level features; does not analyze content or engagement patterns
190
 
191
- ### Potential Biases
192
 
193
- - May be biased toward bot patterns present in the training data
194
- - Could have regional or cultural biases depending on training data composition
195
- - May misclassify legitimate accounts with unusual behavior patterns
196
- - Potential bias against new accounts or accounts with low activity
197
 
198
- ### Recommendations
 
 
 
 
 
 
199
 
200
- - Regularly retrain the model with new data to capture evolving bot patterns
201
- - Use as part of a multi-layered detection system
202
- - Implement human review for high-stakes decisions
203
- - Monitor for false positives and adjust classification thresholds accordingly
204
- - Combine with content analysis and engagement pattern detection
205
 
206
- ## Ethical Considerations
207
 
208
- - This model should be used responsibly and not for harassment or stalking
209
- - Consider privacy implications when analyzing user accounts
210
- - Ensure compliance with Instagram's terms of service and relevant privacy laws (GDPR, CCPA, etc.)
211
- - Implement appropriate safeguards against misuse
212
- - Provide transparency to users about automated detection systems
213
- - Allow for appeals and manual review processes
214
 
215
- ## Model Card Authors
 
 
 
216
 
217
- This model card was created as part of the Bot Detection project for social media platforms.
 
 
 
218
 
219
- ## Citation
220
 
221
- If you use this model in your research, please cite:
222
 
223
- ```bibtex
224
- @misc{instagram_bot_detection_2024,
225
- title={Instagram Bot Detection Model},
226
- author={Your Name/Organization},
227
- year={2024},
228
- publisher={Hugging Face},
229
- howpublished={\url{https://huggingface.co/your-username/instagram-bot-detection}}
230
- }
231
- ```
232
 
233
- ## Related Models
234
 
235
- - [TikTok Bot Detection](https://huggingface.co/your-username/tiktok-bot-detection)
236
- - [Twitter Bot Detection](https://huggingface.co/your-username/twitter-bot-detection)
237
 
238
- ## Contact
 
 
 
239
 
240
- For questions or feedback about this model, please open an issue in the repository or contact the maintainers.
241
 
242
- ## Updates and Maintenance
243
 
244
- - **Version**: 1.0
245
- - **Last Updated**: November 2024
246
- - **Status**: Active
247
 
248
- Future updates may include:
249
 
250
- - Improved feature engineering
251
- - Additional training data with recent bot patterns
252
- - Hyperparameter optimization
253
- - Support for new Instagram features (Reels, etc.)
254
- - Integration of content-based features
255
- - Multi-model ensemble approach
 
1
+ # INSTAGRAM Bot Detection Model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
+ ## Overview
4
+ This directory contains a trained Random Forest classifier for detecting bot accounts on Instagram.
 
 
 
5
 
6
+ **Model Version:** v2
7
+ **Training Date:** 2025-11-27 11:38:28
8
+ **Framework:** scikit-learn 1.5.2
9
+ **Algorithm:** Random Forest Classifier with GridSearchCV Hyperparameter Tuning
10
 
11
+ ---
12
 
13
+ ## 📊 Model Performance
 
14
 
15
+ ### Final Metrics (Test Set)
16
+ | Metric | Score |
17
+ |--------|-------|
18
+ | **Accuracy** | 0.9860 (98.60%) |
19
+ | **Precision** | 0.9918 (99.18%) |
20
+ | **Recall** | 0.9796 (97.96%) |
21
+ | **F1-Score** | 0.9857 (98.57%) |
22
+ | **ROC-AUC** | 0.9990 (99.90%) |
23
+ | **Average Precision** | 0.9990 (99.90%) |
24
 
25
+ ### Model Improvement
26
+ - **Baseline ROC-AUC:** 0.9988
27
+ - **Tuned ROC-AUC:** 0.9990
28
+ - **Improvement:** 0.0002 (0.02%)
29
 
30
+ ---
31
 
32
+ ## 🗂️ Files
 
 
 
 
 
 
 
 
 
 
 
33
 
34
+ | File | Description |
35
+ |------|-------------|
36
+ | `instagram_bot_detection_v2.pkl` | Trained Random Forest model |
37
+ | `instagram_scaler_v2.pkl` | MinMaxScaler for feature normalization |
38
+ | `instagram_features_v2.json` | List of features used by the model |
39
+ | `instagram_metrics_v2.txt` | Detailed performance metrics report |
40
+ | `images/` | All visualization plots (13 images) |
41
+ | `README.md` | This file |
42
 
43
+ ---
44
 
45
+ ## 🎯 Dataset Information
 
 
 
 
46
 
47
+ ### Training Configuration
48
+ - **Training Samples:** 4,000
49
+ - **Test Samples:** 1,000
50
+ - **Total Samples:** 5,000
51
+ - **Number of Features:** 10
52
+ - **Cross-Validation Folds:** 5
53
+ - **Random State:** 42
54
 
55
+ ### Class Distribution
56
+ **Training Set:**
57
+ - Human (0): 1,991 (49.78%)
58
+ - Bot (1): 2,009 (50.22%)
59
 
60
+ **Test Set:**
61
+ - Human (0): 509 (50.90%)
62
+ - Bot (1): 491 (49.10%)
63
 
64
+ ---
65
 
66
+ ## 🔧 Features (10)
 
 
67
 
68
+ 1. `profile_pic`
69
+ 2. `username_num_ratio`
70
+ 3. `username_is_numeric`
71
+ 4. `fullname_words`
72
+ 5. `fullname_num_ratio`
73
+ 6. `is_name_number_only`
74
+ 7. `name_equals_username`
75
+ 8. `followers`
76
+ 9. `follows`
77
+ 10. `followers_to_follows_ratio`
78
 
79
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
 
81
+ ## 🏆 Top 5 Most Important Features
 
82
 
83
+ 1. **profile_pic** - 0.3314
84
+ 8. **followers** - 0.2313
85
+ 2. **username_num_ratio** - 0.1665
86
+ 10. **followers_to_follows_ratio** - 0.1308
87
+ 9. **follows** - 0.0923
88
 
89
+ ---
 
 
90
 
91
+ ## ⚙️ Hyperparameters
 
 
92
 
93
+ ### Best Parameters (from GridSearchCV)
94
+ - **class_weight:** balanced
95
+ - **max_depth:** 15
96
+ - **max_features:** sqrt
97
+ - **min_samples_leaf:** 1
98
+ - **min_samples_split:** 2
99
+ - **n_estimators:** 100
100
 
101
+ ### Parameter Search Space
102
+ - **n_estimators:** [100, 200, 300]
103
+ - **max_depth:** [10, 15, 20, None]
104
+ - **min_samples_split:** [2, 5, 10]
105
+ - **min_samples_leaf:** [1, 2, 4]
106
+ - **max_features:** ['sqrt', 'log2']
107
+ - **bootstrap:** [True, False]
108
 
109
+ **Total combinations tested:** 540
 
110
 
111
+ ---
 
 
112
 
113
+ ## 📈 Cross-Validation Results
 
 
114
 
115
+ ### Mean Scores (5-Fold Stratified CV)
116
+ - **Accuracy:** 0.9848 (±0.0051)
117
+ - **Precision:** 0.9900 (±0.0066)
118
+ - **Recall:** 0.9796 (±0.0081)
119
+ - **F1-Score:** 0.9847 (±0.0051)
120
+ - **ROC-AUC:** 0.9986 (±0.0011)
121
 
122
+ ---
123
 
124
+ ## 🖼️ Visualizations
 
 
 
125
 
126
+ All visualizations are saved in the `images/` directory:
127
 
128
+ 1. **01_class_distribution.png** - Training/Test set class distribution
129
+ 2. **02_feature_correlation.png** - Feature correlation with target variable
130
+ 3. **03_correlation_matrix.png** - Feature correlation heatmap
131
+ 4. **04_baseline_confusion_matrix.png** - Baseline model confusion matrix
132
+ 5. **05_baseline_roc_curve.png** - Baseline ROC curve
133
+ 6. **06_baseline_precision_recall.png** - Baseline Precision-Recall curve
134
+ 7. **07_baseline_feature_importance.png** - Baseline feature importance
135
+ 8. **08_cross_validation.png** - Cross-validation score distribution
136
+ 9. **09_tuned_confusion_matrix.png** - Tuned model confusion matrix
137
+ 10. **10_tuned_roc_curve.png** - Tuned ROC curve
138
+ 11. **11_tuned_precision_recall.png** - Tuned Precision-Recall curve
139
+ 12. **12_tuned_feature_importance.png** - Tuned feature importance
140
+ 13. **13_model_comparison.png** - Baseline vs Tuned comparison
141
 
142
+ ---
143
 
144
+ ## 🚀 Usage Example
 
 
 
145
 
146
+ ```python
147
+ import joblib
148
+ import pandas as pd
149
+ import numpy as np
150
 
151
+ # Load model and scaler
152
+ model = joblib.load('instagram_bot_detection_v2.pkl')
153
+ scaler = joblib.load('instagram_scaler_v2.pkl')
154
+
155
+ # Prepare your data (example)
156
+ data = {
157
+ 'profile_pic': 0.5,
158
+ 'username_num_ratio': 0.5,
159
+ 'username_is_numeric': 0.5,
160
+ 'fullname_words': 0.5,
161
+ 'fullname_num_ratio': 0.5,
162
+ 'is_name_number_only': 0.5,
163
+ 'name_equals_username': 0.5,
164
+ 'followers': 0.5,
165
+ 'follows': 0.5,
166
+ 'followers_to_follows_ratio': 0.5,
167
+ }
168
 
169
+ # Create DataFrame
170
+ df = pd.DataFrame([data])
171
 
172
+ # Scale features
173
+ df_scaled = scaler.transform(df)
174
 
175
+ # Predict
176
+ prediction = model.predict(df_scaled)[0]
177
+ probability = model.predict_proba(df_scaled)[0]
178
 
179
+ print(f"Prediction: {'Bot' if prediction == 1 else 'Human'}")
180
+ print(f"Bot Probability: {probability[1]:.4f}")
181
+ print(f"Human Probability: {probability[0]:.4f}")
182
+ ```
 
183
 
184
+ ---
185
 
186
+ ## 📋 Confusion Matrix Breakdown
 
 
 
187
 
188
+ ### Tuned Model (Test Set)
189
+ ```
190
+ Predicted
191
+ Human Bot
192
+ Actual Human 505 4
193
+ Bot 10 481
194
+ ```
195
 
196
+ - **True Negatives (TN):** 505 (Correctly identified humans)
197
+ - **False Positives (FP):** 4 (Humans incorrectly classified as bots)
198
+ - **False Negatives (FN):** 10 (Bots incorrectly classified as humans)
199
+ - **True Positives (TP):** 481 (Correctly identified bots)
 
200
 
201
+ ---
202
 
203
+ ## 🔍 Model Interpretation
 
 
 
 
 
204
 
205
+ ### Strengths
206
+ - High ROC-AUC score (0.9990) indicates excellent discrimination capability
207
+ - Balanced precision and recall for both classes
208
+ - Robust cross-validation performance
209
 
210
+ ### Key Insights
211
+ 1. Top features drive bot classification effectively
212
+ 2. GridSearchCV improved performance over baseline by 0.02%
213
+ 3. Model generalizes well on unseen test data
214
 
215
+ ---
216
 
217
+ ## 📝 Notes
218
 
219
+ - **Feature Scaling:** All features are scaled using MinMaxScaler to [0, 1] range
220
+ - **Missing Values:** Filled with 0 during preprocessing
221
+ - **Class Balance:** Balanced dataset
222
+ - **Model Type:** Ensemble method resistant to overfitting
 
 
 
 
 
223
 
224
+ ---
225
 
226
+ ## 🔄 Model Updates
 
227
 
228
+ To retrain the model:
229
+ 1. Place new training data in `../data/train_instagram.csv`
230
+ 2. Run the training notebook: `5_enhanced_training.ipynb`
231
+ 3. Update this README with new metrics
232
 
233
+ ---
234
 
235
+ ## 📧 Contact & Support
236
 
237
+ For questions or issues regarding this model, please refer to the main project documentation.
 
 
238
 
239
+ ---
240
 
241
+ **Generated:** 2025-11-27 11:38:28
242
+ **Notebook:** `5_enhanced_training.ipynb`
243
+ **Platform:** Instagram
 
 
 
images/01_class_distribution.png ADDED

Git LFS Details

  • SHA256: 5d6aba9af735cf0fc01dfe94dae16ca999eea115d620b8eaca71cee6fed078de
  • Pointer size: 131 Bytes
  • Size of remote file: 115 kB
images/02_feature_correlation.png ADDED

Git LFS Details

  • SHA256: e45af57e954ff7ce5db00689bc4c507ff8434db53b56cd7db88ad376b2a75e6c
  • Pointer size: 131 Bytes
  • Size of remote file: 136 kB
images/03_correlation_matrix.png ADDED

Git LFS Details

  • SHA256: eb938d6809f07ecd1a2f65861826b2d0d4d6ea8ebbbe9a6e459ccc29403aae10
  • Pointer size: 131 Bytes
  • Size of remote file: 306 kB
images/04_baseline_confusion_matrix.png ADDED
images/05_baseline_roc_curve.png ADDED

Git LFS Details

  • SHA256: 718d6e50c27dd37bb0879b9a3fb77a119e08e0d3cf2707f3cbab251df8e9584d
  • Pointer size: 131 Bytes
  • Size of remote file: 138 kB
images/06_baseline_precision_recall.png ADDED
images/07_baseline_feature_importance.png ADDED

Git LFS Details

  • SHA256: a23c568c3aa732c289d1c78d203d8266b7f8544d2396d0851e513d65bf9c69b5
  • Pointer size: 131 Bytes
  • Size of remote file: 132 kB
images/08_cross_validation.png ADDED

Git LFS Details

  • SHA256: 45a825c1046aa6f9119d9a43410a057e688e04bd7d88aeca526340c74cc5fc88
  • Pointer size: 131 Bytes
  • Size of remote file: 125 kB
images/09_tuned_confusion_matrix.png ADDED
images/10_tuned_roc_curve.png ADDED

Git LFS Details

  • SHA256: 215cd578229dbc1877b333b42991113b126414610dc89f54f6490a1697a3dc11
  • Pointer size: 131 Bytes
  • Size of remote file: 135 kB
images/11_tuned_precision_recall.png ADDED
images/12_tuned_feature_importance.png ADDED

Git LFS Details

  • SHA256: 52a78a7fe1eb004fc9fb3285702c1bd1cc4cb89235aeb6f4047d1f73e7688260
  • Pointer size: 131 Bytes
  • Size of remote file: 129 kB
images/13_model_comparison.png ADDED

Git LFS Details

  • SHA256: 2efffce38be5af7a09d76c9e08f0b3c8e5d20a037f9620633c14f81e255e746d
  • Pointer size: 131 Bytes
  • Size of remote file: 120 kB
instagram_bot_detection_v2.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:754546a9111c11b3b9e5ebe0cab64613ac81fa25043da96875a13ab47b39060c
3
+ size 1994105
instagram_features_v2.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ "profile_pic",
3
+ "username_num_ratio",
4
+ "username_is_numeric",
5
+ "fullname_words",
6
+ "fullname_num_ratio",
7
+ "is_name_number_only",
8
+ "name_equals_username",
9
+ "followers",
10
+ "follows",
11
+ "followers_to_follows_ratio"
12
+ ]
instagram_metrics_v2.txt ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ======================================================================
2
+ INSTAGRAM Bot Detection Model - Performance Report
3
+ ======================================================================
4
+
5
+ Date: 2025-11-27 11:38:28.480839
6
+
7
+ Training Configuration:
8
+ - Platform: instagram
9
+ - Train samples: 4000
10
+ - Test samples: 1000
11
+ - Features: 10
12
+ - CV Folds: 5
13
+ - Random State: 42
14
+
15
+ Best Hyperparameters:
16
+ - class_weight: balanced
17
+ - max_depth: 15
18
+ - max_features: sqrt
19
+ - min_samples_leaf: 1
20
+ - min_samples_split: 2
21
+ - n_estimators: 100
22
+
23
+ Performance Metrics (Test Set):
24
+ - Accuracy: 0.9860
25
+ - Precision: 0.9918
26
+ - Recall: 0.9796
27
+ - F1: 0.9857
28
+ - Roc_auc: 0.9990
29
+ - Avg_precision: 0.9990
30
+
31
+ Cross-Validation Results:
32
+ - Mean ROC-AUC: 0.9988
33
+
34
+ Feature Importance (Top 5):
35
+ - profile_pic: 0.3314
36
+ - followers: 0.2313
37
+ - username_num_ratio: 0.1665
38
+ - followers_to_follows_ratio: 0.1308
39
+ - follows: 0.0923
instagram_model_comparison.csv ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ Metric,Baseline,Tuned,Improvement,Improvement %
2
+ Accuracy,0.986,0.986,0.0,0.0
3
+ Precision,0.9917525773195877,0.9917525773195877,0.0,0.0
4
+ Recall,0.9796334012219959,0.9796334012219959,0.0,0.0
5
+ F1-Score,0.985655737704918,0.985655737704918,0.0,0.0
6
+ ROC-AUC,0.998803612370408,0.9989916733021498,0.00018806093174172922,0.018828619501626963
7
+ Avg Precision,0.9988880328453673,0.9990380665868485,0.00015003374148114812,0.015020075979263841
instagram_scaler_v2.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b3969a962396dc6ff11bc9767e60de5e75c8b5a07a156b069723e80f49dc48ed
3
+ size 1511