Spaces:

An24
/

toxic-language-detector

Paused

App Files Files Community

toxic-language-detector / README.md

An24

Update README.md

d1c5e1c verified about 1 year ago

preview code

raw

history blame contribute delete

4.12 kB

A newer version of the Streamlit SDK is available: 1.56.0

Upgrade

metadata

title: My Hugging Face Space
emoji: 🚀
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.25.0
app_file: app.py
pinned: false

Check out the configuration reference at Hugging Face Spaces Config.

Social Media Toxicity Detector

A browser extension that detects toxic, offensive, hate speech, and spam content on social media platforms using a machine learning model.

Features

Detection of toxic content on Facebook, Twitter, and YouTube
Classification into 4 categories: Clean (0), Offensive (1), Hate Speech (2), and Spam (3)
Real-time content scanning on social media platforms
Manual text analysis
Admin dashboard for content monitoring and analytics
User role-based access control
Comment log and history tracking

Project Structure

The project is organized into two main components:

Backend API: FastAPI-based REST API for model inference, user management, and data storage
Browser Extension: Chrome extension for content detection and user interface

Backend Setup

Prerequisites

Python 3.9+
PostgreSQL with pgvector extension
Virtual environment (recommended)

Installation

Clone the repository:

git clone https://github.com/yourusername/social-media-toxicity-detector.git
cd social-media-toxicity-detector

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Set up environment variables by creating a .env file:

# API Configuration
SECRET_KEY=your-secret-key-here
ACCESS_TOKEN_EXPIRE_MINUTES=30

# Database Configuration
POSTGRES_SERVER=localhost
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
POSTGRES_DB=toxicity_detector
POSTGRES_PORT=5432

# ML Model Configuration
MODEL_PATH=model/toxicity_detector.h5
HUGGINGFACE_API_URL=https://api-inference.huggingface.co/models/your-model-endpoint
HUGGINGFACE_API_TOKEN=your-huggingface-token

# Social Media APIs
FACEBOOK_API_KEY=your-facebook-api-key
TWITTER_API_KEY=your-twitter-api-key
YOUTUBE_API_KEY=your-youtube-api-key

Initialize the database:

alembic revision --autogenerate -m "Initial migration"
alembic upgrade head

Start the API server:

uvicorn backend.main:app --reload

API Documentation

Once the server is running, you can access the API documentation at:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Extension Setup

Navigate to the extension directory:

cd extension

Configure the API endpoint in background.js:

const API_BASE_URL = 'http://localhost:8000/api'; // Change to your actual API endpoint

Install the extension in Chrome:
- Open Chrome and navigate to chrome://extensions/
- Enable "Developer mode"
- Click "Load unpacked" and select the extension directory

Usage

After installing the extension, click on the extension icon in the toolbar
Log in with your credentials
Visit Facebook, Twitter, or YouTube to activate content scanning
Use the extension popup to scan pages manually or analyze specific text
Access the admin dashboard at http://localhost:8000/admin (requires admin login)

Model Training

The toxicity detection model was trained using a dataset with 4 labels:

0: Clean content
1: Offensive content
2: Hate speech
3: Spam

The model file (.h5) should be placed in the model directory or served via Hugging Face API.

Database Schema

The system uses PostgreSQL with pgvector extension for vector similarity search:

Users: User accounts with role-based permissions
Roles: User roles (admin, moderator, user)
Comments: Detected comments with classification results and vector embeddings
Logs: System activity logs

Security Features

JWT authentication
Role-based access control
Password hashing with bcrypt
Request logging
Input validation and sanitization

License

MIT License