nahiar
/

instagram-bot-detection

@@ -1,243 +1,11 @@
-# INSTAGRAM Bot Detection Model
-## Overview
-This directory contains a trained Random Forest classifier for detecting bot accounts on Instagram.
-**Model Version:** v2
-**Training Date:** 2025-11-27 11:38:28
-**Framework:** scikit-learn 1.5.2
-**Algorithm:** Random Forest Classifier with GridSearchCV Hyperparameter Tuning
----
-## 📊 Model Performance
-### Final Metrics (Test Set)
-| Metric | Score |
-|--------|-------|
-| **Accuracy** | 0.9860 (98.60%) |
-| **Precision** | 0.9918 (99.18%) |
-| **Recall** | 0.9796 (97.96%) |
-| **F1-Score** | 0.9857 (98.57%) |
-| **ROC-AUC** | 0.9990 (99.90%) |
-| **Average Precision** | 0.9990 (99.90%) |
-### Model Improvement
-- **Baseline ROC-AUC:** 0.9988
-- **Tuned ROC-AUC:** 0.9990
-- **Improvement:** 0.0002 (0.02%)
----
-## 🗂️ Files
-| File | Description |
-|------|-------------|
-| `instagram_bot_detection_v2.pkl` | Trained Random Forest model |
-| `instagram_scaler_v2.pkl` | MinMaxScaler for feature normalization |
-| `instagram_features_v2.json` | List of features used by the model |
-| `instagram_metrics_v2.txt` | Detailed performance metrics report |
-| `images/` | All visualization plots (13 images) |
-| `README.md` | This file |
----
-## 🎯 Dataset Information
-### Training Configuration
-- **Training Samples:** 4,000
-- **Test Samples:** 1,000
-- **Total Samples:** 5,000
-- **Number of Features:** 10
-- **Cross-Validation Folds:** 5
-- **Random State:** 42
-### Class Distribution
-**Training Set:**
-- Human (0): 1,991 (49.78%)
-- Bot (1): 2,009 (50.22%)
-**Test Set:**
-- Human (0): 509 (50.90%)
-- Bot (1): 491 (49.10%)
----
-## 🔧 Features (10)
-1. `profile_pic`
-2. `username_num_ratio`
-3. `username_is_numeric`
-4. `fullname_words`
-5. `fullname_num_ratio`
-6. `is_name_number_only`
-7. `name_equals_username`
-8. `followers`
-9. `follows`
-10. `followers_to_follows_ratio`
----
-## 🏆 Top 5 Most Important Features
-1. **profile_pic** - 0.3314
-8. **followers** - 0.2313
-2. **username_num_ratio** - 0.1665
-10. **followers_to_follows_ratio** - 0.1308
-9. **follows** - 0.0923
----
-## ⚙️ Hyperparameters
-### Best Parameters (from GridSearchCV)
-- **class_weight:** balanced
-- **max_depth:** 15
-- **max_features:** sqrt
-- **min_samples_leaf:** 1
-- **min_samples_split:** 2
-- **n_estimators:** 100
-### Parameter Search Space
-- **n_estimators:** [100, 200, 300]
-- **max_depth:** [10, 15, 20, None]
-- **min_samples_split:** [2, 5, 10]
-- **min_samples_leaf:** [1, 2, 4]
-- **max_features:** ['sqrt', 'log2']
-- **bootstrap:** [True, False]
-**Total combinations tested:** 540
----
-## 📈 Cross-Validation Results
-### Mean Scores (5-Fold Stratified CV)
-- **Accuracy:** 0.9848 (±0.0051)
-- **Precision:** 0.9900 (±0.0066)
-- **Recall:** 0.9796 (±0.0081)
-- **F1-Score:** 0.9847 (±0.0051)
-- **ROC-AUC:** 0.9986 (±0.0011)
 ---
-## 🖼️ Visualizations
-All visualizations are saved in the `images/` directory:
-1. **01_class_distribution.png** - Training/Test set class distribution
-2. **02_feature_correlation.png** - Feature correlation with target variable
-3. **03_correlation_matrix.png** - Feature correlation heatmap
-4. **04_baseline_confusion_matrix.png** - Baseline model confusion matrix
-5. **05_baseline_roc_curve.png** - Baseline ROC curve
-6. **06_baseline_precision_recall.png** - Baseline Precision-Recall curve
-7. **07_baseline_feature_importance.png** - Baseline feature importance
-8. **08_cross_validation.png** - Cross-validation score distribution
-9. **09_tuned_confusion_matrix.png** - Tuned model confusion matrix
-10. **10_tuned_roc_curve.png** - Tuned ROC curve
-11. **11_tuned_precision_recall.png** - Tuned Precision-Recall curve
-12. **12_tuned_feature_importance.png** - Tuned feature importance
-13. **13_model_comparison.png** - Baseline vs Tuned comparison
----
-## 🚀 Usage Example
-```python
-import joblib
-import pandas as pd
-import numpy as np
-# Load model and scaler
-model = joblib.load('instagram_bot_detection_v2.pkl')
-scaler = joblib.load('instagram_scaler_v2.pkl')
-# Prepare your data (example)
-data = {
-    'profile_pic': 0.5,
-    'username_num_ratio': 0.5,
-    'username_is_numeric': 0.5,
-    'fullname_words': 0.5,
-    'fullname_num_ratio': 0.5,
-    'is_name_number_only': 0.5,
-    'name_equals_username': 0.5,
-    'followers': 0.5,
-    'follows': 0.5,
-    'followers_to_follows_ratio': 0.5,
-}
-# Create DataFrame
-df = pd.DataFrame([data])
-# Scale features
-df_scaled = scaler.transform(df)
-# Predict
-prediction = model.predict(df_scaled)[0]
-probability = model.predict_proba(df_scaled)[0]
-print(f"Prediction: {'Bot' if prediction == 1 else 'Human'}")
-print(f"Bot Probability: {probability[1]:.4f}")
-print(f"Human Probability: {probability[0]:.4f}")
-```
 ---
-## 📋 Confusion Matrix Breakdown
-### Tuned Model (Test Set)
-```
-                Predicted
-              Human    Bot
-Actual Human      505       4
-       Bot         10     481
-```
-- **True Negatives (TN):** 505 (Correctly identified humans)
-- **False Positives (FP):** 4 (Humans incorrectly classified as bots)
-- **False Negatives (FN):** 10 (Bots incorrectly classified as humans)
-- **True Positives (TP):** 481 (Correctly identified bots)
----
-## 🔍 Model Interpretation
-### Strengths
-- High ROC-AUC score (0.9990) indicates excellent discrimination capability
-- Balanced precision and recall for both classes
-- Robust cross-validation performance
-### Key Insights
-1. Top features drive bot classification effectively
-2. GridSearchCV improved performance over baseline by 0.02%
-3. Model generalizes well on unseen test data
----
-## 📝 Notes
-- **Feature Scaling:** All features are scaled using MinMaxScaler to [0, 1] range
-- **Missing Values:** Filled with 0 during preprocessing
-- **Class Balance:** Balanced dataset
-- **Model Type:** Ensemble method resistant to overfitting
----
-## 🔄 Model Updates
-To retrain the model:
-1. Place new training data in `../data/train_instagram.csv`
-2. Run the training notebook: `5_enhanced_training.ipynb`
-3. Update this README with new metrics
----
-## 📧 Contact & Support
-For questions or issues regarding this model, please refer to the main project documentation.
----
-**Generated:** 2025-11-27 11:38:28
-**Notebook:** `5_enhanced_training.ipynb`
-**Platform:** Instagram

 ---
+language: "en"
+license: "apache-2.0"
+created: "2025-11-27T05:32:51.193018Z"
 ---
+# nahiar/instagram-bot-detection
+A short description of this model.
+-- Add details for: how to use, training data, limitations, citation, and license.