A newer version of the Streamlit SDK is available: 1.56.0
metadata
title: My Hugging Face Space
emoji: 🚀
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.25.0
app_file: app.py
pinned: false
Check out the configuration reference at Hugging Face Spaces Config.
Social Media Toxicity Detector
A browser extension that detects toxic, offensive, hate speech, and spam content on social media platforms using a machine learning model.
Features
- Detection of toxic content on Facebook, Twitter, and YouTube
- Classification into 4 categories: Clean (0), Offensive (1), Hate Speech (2), and Spam (3)
- Real-time content scanning on social media platforms
- Manual text analysis
- Admin dashboard for content monitoring and analytics
- User role-based access control
- Comment log and history tracking
Project Structure
The project is organized into two main components:
- Backend API: FastAPI-based REST API for model inference, user management, and data storage
- Browser Extension: Chrome extension for content detection and user interface
Backend Setup
Prerequisites
- Python 3.9+
- PostgreSQL with pgvector extension
- Virtual environment (recommended)
Installation
- Clone the repository:
git clone https://github.com/yourusername/social-media-toxicity-detector.git
cd social-media-toxicity-detector
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables by creating a
.envfile:
# API Configuration
SECRET_KEY=your-secret-key-here
ACCESS_TOKEN_EXPIRE_MINUTES=30
# Database Configuration
POSTGRES_SERVER=localhost
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
POSTGRES_DB=toxicity_detector
POSTGRES_PORT=5432
# ML Model Configuration
MODEL_PATH=model/toxicity_detector.h5
HUGGINGFACE_API_URL=https://api-inference.huggingface.co/models/your-model-endpoint
HUGGINGFACE_API_TOKEN=your-huggingface-token
# Social Media APIs
FACEBOOK_API_KEY=your-facebook-api-key
TWITTER_API_KEY=your-twitter-api-key
YOUTUBE_API_KEY=your-youtube-api-key
- Initialize the database:
alembic revision --autogenerate -m "Initial migration"
alembic upgrade head
- Start the API server:
uvicorn backend.main:app --reload
API Documentation
Once the server is running, you can access the API documentation at:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
Extension Setup
- Navigate to the extension directory:
cd extension
- Configure the API endpoint in
background.js:
const API_BASE_URL = 'http://localhost:8000/api'; // Change to your actual API endpoint
- Install the extension in Chrome:
- Open Chrome and navigate to
chrome://extensions/ - Enable "Developer mode"
- Click "Load unpacked" and select the
extensiondirectory
- Open Chrome and navigate to
Usage
- After installing the extension, click on the extension icon in the toolbar
- Log in with your credentials
- Visit Facebook, Twitter, or YouTube to activate content scanning
- Use the extension popup to scan pages manually or analyze specific text
- Access the admin dashboard at
http://localhost:8000/admin(requires admin login)
Model Training
The toxicity detection model was trained using a dataset with 4 labels:
- 0: Clean content
- 1: Offensive content
- 2: Hate speech
- 3: Spam
The model file (.h5) should be placed in the model directory or served via Hugging Face API.
Database Schema
The system uses PostgreSQL with pgvector extension for vector similarity search:
- Users: User accounts with role-based permissions
- Roles: User roles (admin, moderator, user)
- Comments: Detected comments with classification results and vector embeddings
- Logs: System activity logs
Security Features
- JWT authentication
- Role-based access control
- Password hashing with bcrypt
- Request logging
- Input validation and sanitization