Spaces:
Running
Running
Commit
·
ec7c185
1
Parent(s):
389e084
Updated ReadMe
Browse files- Dataset/Readme.md +30 -34
Dataset/Readme.md
CHANGED
|
@@ -1,46 +1,42 @@
|
|
| 1 |
-
# Bot Detection
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
| 4 |
|
| 5 |
-
## Dataset
|
| 6 |
|
| 7 |
-
The dataset
|
|
|
|
|
|
|
| 8 |
|
| 9 |
-
|
| 10 |
-
-
|
| 11 |
-
-
|
| 12 |
-
-
|
| 13 |
-
-
|
| 14 |
-
-
|
| 15 |
-
- Verified: A boolean value indicating whether the user is verified or not.
|
| 16 |
-
- Bot Label: A label indicating whether the user is a bot (1) or not (0).
|
| 17 |
-
- Location: The location associated with the user.
|
| 18 |
-
- Created At: The date and time when the tweet was created.
|
| 19 |
-
- Hashtags: The hashtags associated with the tweet.
|
| 20 |
|
| 21 |
-
##
|
| 22 |
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
5. Train the model: Train the chosen algorithm(s) on the training data.
|
| 28 |
-
6. Evaluate the model: Evaluate the model's performance using appropriate evaluation metrics.
|
| 29 |
-
7. Predict Bot or Not: Apply the trained model to new data to predict whether a user is a bot or not.
|
| 30 |
|
| 31 |
-
##
|
| 32 |
|
| 33 |
-
|
|
|
|
| 34 |
|
| 35 |
-
|
| 36 |
-
- Random Forest
|
| 37 |
-
- Gradient Boosting (XGBoost, LightGBM)
|
| 38 |
-
- Support Vector Machines (SVM)
|
| 39 |
-
- Neural Networks (MLPs, CNNs)
|
| 40 |
|
| 41 |
-
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
Enjoy exploring the Bot Detection Dataset and discovering insights into Twitter bot accounts! 🚀🔍
|
| 46 |
|
|
|
|
|
|
|
|
|
| 1 |
+
# Social Media Bot Detection (Metadata-based)
|
| 2 |
|
| 3 |
+
This project focuses on detecting automated social media accounts using structured profile and behavioral metadata.
|
| 4 |
+
Instead of relying on tweet content or NLP techniques, the model analyzes account-level and activity-based features
|
| 5 |
+
to identify bot-like patterns.
|
| 6 |
|
| 7 |
+
## Dataset Overview
|
| 8 |
|
| 9 |
+
The dataset consists of user profile and activity metadata collected at the account level.
|
| 10 |
+
Each record represents a user and includes structured numerical and boolean attributes, along with a binary label
|
| 11 |
+
indicating whether the account is automated (bot) or human-operated.
|
| 12 |
|
| 13 |
+
### Example Features Used
|
| 14 |
+
- Follower count and following count
|
| 15 |
+
- Follower–following ratio
|
| 16 |
+
- Posting activity (status count)
|
| 17 |
+
- Account age (in days)
|
| 18 |
+
- Profile attributes (verified status, default profile settings)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
+
## Modeling Approach
|
| 21 |
|
| 22 |
+
- **Preprocessing:** Cleaned and standardized structured metadata features.
|
| 23 |
+
- **Feature Engineering:** Derived behavioral indicators such as follower–following ratio and account age.
|
| 24 |
+
- **Modeling:** Trained a Random Forest classifier to distinguish bot and human accounts.
|
| 25 |
+
- **Explainability:** Used feature importance to interpret which attributes influence predictions.
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
+
## Evaluation
|
| 28 |
|
| 29 |
+
Model evaluation was performed offline using standard classification metrics such as accuracy and recall.
|
| 30 |
+
The Streamlit application focuses on inference and explainability rather than live metric reporting.
|
| 31 |
|
| 32 |
+
## Application Demo
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
+
A lightweight Streamlit interface is provided to:
|
| 35 |
+
- Input account metadata
|
| 36 |
+
- Generate bot or human predictions
|
| 37 |
+
- Visualize feature importance for interpretability
|
| 38 |
|
| 39 |
+
## Notes
|
|
|
|
|
|
|
| 40 |
|
| 41 |
+
This project is intended as a prototype to demonstrate machine learning workflows, feature engineering,
|
| 42 |
+
and model interpretability using structured data rather than production-scale deployment.
|