Spaces:
Sleeping
Sleeping
File size: 1,838 Bytes
ec7c185 0fc348c ec7c185 0fc348c ec7c185 0fc348c ec7c185 0fc348c ec7c185 0fc348c ec7c185 0fc348c ec7c185 0fc348c ec7c185 0fc348c ec7c185 0fc348c ec7c185 0fc348c ec7c185 0fc348c ec7c185 0fc348c ec7c185 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
# Social Media Bot Detection (Metadata-based)
This project focuses on detecting automated social media accounts using structured profile and behavioral metadata.
Instead of relying on tweet content or NLP techniques, the model analyzes account-level and activity-based features
to identify bot-like patterns.
## Dataset Overview
The dataset consists of user profile and activity metadata collected at the account level.
Each record represents a user and includes structured numerical and boolean attributes, along with a binary label
indicating whether the account is automated (bot) or human-operated.
### Example Features Used
- Follower count and following count
- Follower–following ratio
- Posting activity (status count)
- Account age (in days)
- Profile attributes (verified status, default profile settings)
## Modeling Approach
- **Preprocessing:** Cleaned and standardized structured metadata features.
- **Feature Engineering:** Derived behavioral indicators such as follower–following ratio and account age.
- **Modeling:** Trained a Random Forest classifier to distinguish bot and human accounts.
- **Explainability:** Used feature importance to interpret which attributes influence predictions.
## Evaluation
Model evaluation was performed offline using standard classification metrics such as accuracy and recall.
The Streamlit application focuses on inference and explainability rather than live metric reporting.
## Application Demo
A lightweight Streamlit interface is provided to:
- Input account metadata
- Generate bot or human predictions
- Visualize feature importance for interpretability
## Notes
This project is intended as a prototype to demonstrate machine learning workflows, feature engineering,
and model interpretability using structured data rather than production-scale deployment.
|