Spaces:

ASHUT0SH-SiNGH
/

BotDetection

Sleeping

App Files Files Community

ASHUT0SH-SiNGH commited on Jan 21

Commit

ec7c185

1 Parent(s): 389e084

Updated ReadMe

Browse files

Files changed (1) hide show

Dataset/Readme.md +30 -34

Dataset/Readme.md CHANGED Viewed

@@ -1,46 +1,42 @@
-# Bot Detection Dataset 🤖🔍
-Welcome to the Bot Detection Dataset! This dataset is designed to facilitate the analysis and detection of bot accounts on Twitter. It contains a collection of user profiles and associated tweet data, along with a binary label indicating whether each user is a bot or not.
-## Dataset Information 📊
-The dataset is provided in a CSV file format named 'bot_detection_dataset.csv'. It includes the following columns:
-- User ID: Unique identifier for each user in the dataset.
-- Username: The username associated with the user.
-- Tweet: The text content of the tweet.
-- Retweet Count: The number of times the tweet has been retweeted.
-- Mention Count: The number of mentions in the tweet.
-- Follower Count: The number of followers the user has.
-- Verified: A boolean value indicating whether the user is verified or not.
-- Bot Label: A label indicating whether the user is a bot (1) or not (0).
-- Location: The location associated with the user.
-- Created At: The date and time when the tweet was created.
-- Hashtags: The hashtags associated with the tweet.
-## How to Use 📝
-1. Load the dataset: Read the 'bot_detection_dataset.csv' file into your preferred data analysis or machine learning tool/library.
-2. Preprocess the data: Perform any necessary data cleaning, handling missing values, and feature engineering.
-3. Split the data: Divide the dataset into training and testing sets.
-4. Choose a Machine Learning Algorithm: Select one or more algorithms suitable for binary classification, such as Logistic Regression, Random Forest, Gradient Boosting, Support Vector Machines, or Neural Networks.
-5. Train the model: Train the chosen algorithm(s) on the training data.
-6. Evaluate the model: Evaluate the model's performance using appropriate evaluation metrics.
-7. Predict Bot or Not: Apply the trained model to new data to predict whether a user is a bot or not.
-## ML Algorithms for Bot Detection 🧠💡
-Several machine learning algorithms can be applied to predict bot accounts using this dataset. Some commonly used algorithms include:
-- Logistic Regression
-- Random Forest
-- Gradient Boosting (XGBoost, LightGBM)
-- Support Vector Machines (SVM)
-- Neural Networks (MLPs, CNNs)
-Experiment with different algorithms and consider performing hyperparameter tuning to optimize the model's performance.
-Remember to acknowledge the dataset source and provide appropriate citations if you use this dataset for research or analysis.
-Enjoy exploring the Bot Detection Dataset and discovering insights into Twitter bot accounts! 🚀🔍

+# Social Media Bot Detection (Metadata-based)
+This project focuses on detecting automated social media accounts using structured profile and behavioral metadata.
+Instead of relying on tweet content or NLP techniques, the model analyzes account-level and activity-based features
+to identify bot-like patterns.
+## Dataset Overview
+The dataset consists of user profile and activity metadata collected at the account level.
+Each record represents a user and includes structured numerical and boolean attributes, along with a binary label
+indicating whether the account is automated (bot) or human-operated.
+### Example Features Used
+- Follower count and following count
+- Follower–following ratio
+- Posting activity (status count)
+- Account age (in days)
+- Profile attributes (verified status, default profile settings)
+## Modeling Approach
+- **Preprocessing:** Cleaned and standardized structured metadata features.
+- **Feature Engineering:** Derived behavioral indicators such as follower–following ratio and account age.
+- **Modeling:** Trained a Random Forest classifier to distinguish bot and human accounts.
+- **Explainability:** Used feature importance to interpret which attributes influence predictions.
+## Evaluation
+Model evaluation was performed offline using standard classification metrics such as accuracy and recall.
+The Streamlit application focuses on inference and explainability rather than live metric reporting.
+## Application Demo
+A lightweight Streamlit interface is provided to:
+- Input account metadata
+- Generate bot or human predictions
+- Visualize feature importance for interpretability
+## Notes
+This project is intended as a prototype to demonstrate machine learning workflows, feature engineering,
+and model interpretability using structured data rather than production-scale deployment.