title: AI Slop Detector
emoji: π
colorFrom: purple
colorTo: blue
sdk: docker
app_port: 7860
pinned: false
π AI Slop Detector
A comprehensive Python API and web UI for detecting AI-generated content in PDFs, DOCX files, and raw text. Uses an ensemble of multiple state-of-the-art detection methods.
Features
β¨ Multi-Detector Ensemble
- RoBERTa Classifier - Fine-tuned RoBERTa model for AI text detection
- Perplexity Analysis - Detects statistical anomalies and repetitive patterns
- LLMDet - Entropy and log-probability based detection
- HuggingFace Classifier - Generic transformer-based classification
- OUTFOX Statistical - Word/sentence length and vocabulary analysis
β¨ Easy Feature Flags
- Enable/disable each detector with a single config change
- Adjust detector weights for ensemble averaging
- Environment variable overrides
β¨ Multiple File Formats
- PDF documents
- DOCX/DOC files
- Plain text files
- Raw text input
β¨ Persistent Storage
- SQLite database (default, configurable)
- Upload history with timestamps
- Detailed result tracking and statistics
β¨ Web UI
- Beautiful, responsive interface
- Drag-and-drop file upload
- Real-time analysis results
- History and statistics views
β¨ REST API
- Analyze text and files via HTTP
- Get historical results
- Query statistics
- Full result management
Installation
Prerequisites
- Python 3.8+
- pip or conda
Setup
- Clone/Navigate to the project:
cd slop-detect
- Create a Python virtual environment:
python -m venv venv
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
- Install dependencies:
pip install -r backend/requirements.txt
Configuration
Enable/Disable Detectors
Edit backend/config/detectors_config.py:
ENABLED_DETECTORS: Dict[str, bool] = {
"roberta": True, # Enable RoBERTa
"perplexity": True, # Enable Perplexity
"llmdet": True, # Enable LLMDet
"hf_classifier": True, # Enable HF Classifier
"outfox": False, # Disable OUTFOX
}
Set Detector Weights
DETECTOR_WEIGHTS: Dict[str, float] = {
"roberta": 0.30, # 30% weight
"perplexity": 0.25, # 25% weight
"llmdet": 0.25, # 25% weight
"hf_classifier": 0.20, # 20% weight
"outfox": 0.00, # 0% weight (not used)
}
Environment-based Configuration
You can also use environment variables to override config:
# Enable/disable detectors
export ENABLE_ROBERTA=true
export ENABLE_PERPLEXITY=true
export ENABLE_LLMDET=true
export ENABLE_HF_CLASSIFIER=true
export ENABLE_OUTFOX=false
# Database
export DATABASE_URL=sqlite:///slop_detect.db
export UPLOAD_FOLDER=./uploads
# Flask
export HOST=0.0.0.0
export PORT=5000
export DEBUG=False
Running the Application
Start the Flask Server
cd backend
python main.py
The API will be available at http://localhost:5000
API Endpoints
Health Check
GET /api/health
Analyze Text
POST /api/analyze/text
Content-Type: application/json
{
"text": "Your text here...",
"filename": "optional_name.txt",
"user_id": "optional_user_id"
}
Response:
{
"status": "success",
"result_id": 1,
"overall_ai_score": 0.78,
"overall_ai_score_percentage": "78.0%",
"overall_confidence": "high",
"status_label": "Likely AI",
"detector_results": {
"roberta": {
"detector_name": "roberta",
"score": 0.85,
"confidence": "high",
"explanation": "Very strong indicators of AI-generated text..."
},
...
},
"enabled_detectors": ["roberta", "perplexity", "llmdet", "hf_classifier"],
"text_stats": {
"character_count": 1500,
"word_count": 250,
"sentence_count": 15,
"average_word_length": 4.8
}
}
Analyze File
POST /api/analyze/file
FormData:
- file: <file>
- user_id: <optional_user_id>
Response: (same as analyze/text)
Get All Results
GET /api/results?page=1&limit=10&sort=recent
Response:
{
"status": "success",
"page": 1,
"limit": 10,
"total_count": 42,
"results": [...]
}
Get Specific Result
GET /api/results/{result_id}
Response:
{
"status": "success",
"result": {
"id": 1,
"filename": "document.pdf",
"overall_ai_score": 0.78,
"overall_ai_score_percentage": "78.0%",
...
}
}
Delete Result
DELETE /api/results/{result_id}
Update Result
PUT /api/results/{result_id}
Content-Type: application/json
{
"notes": "Manual review: likely AI",
"is_flagged": true
}
Get Statistics
GET /api/statistics/summary
Response:
{
"status": "success",
"summary": {
"total_analyses": 42,
"average_ai_score": 0.65,
"total_text_analyzed": 125000,
"likely_human": 15,
"suspicious": 12,
"likely_ai": 15
}
}
Get Configuration
GET /api/config
Response:
{
"status": "success",
"config": {
"enabled_detectors": [
"roberta", "perplexity", "llmdet", "hf_classifier"
],
"aggregation_method": "weighted_average",
"detector_weights": {...},
"detector_info": {...}
}
}
Web Interface
Open http://localhost:5000 in your browser to access the web UI.
Features:
- Upload Section - Drag-and-drop or click to upload files
- Text Analysis - Paste text directly
- Results Dashboard - View detailed analysis results
- History Tab - See all previous analyses
- Statistics Tab - View aggregate statistics
How It Works
Detection Process
- File Parsing - Extracts text from PDF/DOCX/TXT files
- Text Cleaning - Normalizes whitespace and formatting
- Detector Ensemble - Runs enabled detectors in parallel
- Score Aggregation - Combines detector scores using weighted average, max, or voting
- Result Storage - Saves to database with full metadata
- Response - Returns overall score and per-detector breakdown
Detector Details
RoBERTa Detector
- Model: roberta-base-openai-detector
- Type: Transformer-based classification
- Output: 0-1 probability score
- Speed: Medium
Perplexity Detector
- Model: GPT-2
- Method: Analyzes token probability distributions
- Detects: Repetitive patterns, unusual word choices
- Output: 0-1 score based on perplexity, repetition, AI phrases
LLMDet Detector
- Model: BERT
- Method: Entropy and log-probability analysis
- Detects: Predictable sequences, unusual statistical patterns
- Output: 0-1 score from combined metrics
HF Classifier
- Model: Configurable (default: BERT)
- Type: Generic sequence classification
- Output: 0-1 probability score
OUTFOX Statistical
- Type: Statistical signature analysis
- Detects: Unusual word length distributions, sentence structure patterns, vocabulary diversity
- Output: 0-1 score from multiple statistical metrics
Scoring
Default aggregation: Weighted Average
Overall Score = Ξ£ (normalized_detector_score Γ weight)
Each detector's score is normalized to 0-1 range, then multiplied by its configured weight. The sum is clamped to [0, 1].
Confidence Levels
- Very Low (< 20%) - Almost certainly human-written
- Low (20-40%) - Probably human-written
- Medium (40-60%) - Uncertain
- High (60-80%) - Probably AI-generated
- Very High (> 80%) - Almost certainly AI-generated
Project Structure
slop-detect/
βββ backend/
β βββ config/
β β βββ settings.py # App settings
β β βββ detectors_config.py # Detector configuration (FEATURE FLAGS HERE)
β βββ detectors/
β β βββ base.py # Base detector class
β β βββ roberta.py # RoBERTa detector
β β βββ perplexity.py # Perplexity detector
β β βββ llmdet.py # LLMDet detector
β β βββ hf_classifier.py # HF classifier
β β βββ outfox.py # OUTFOX detector
β β βββ ensemble.py # Ensemble manager
β βββ database/
β β βββ models.py # SQLAlchemy models
β β βββ db.py # Database manager
β βββ api/
β β βββ routes.py # Flask API routes
β β βββ models.py # Pydantic request/response models
β βββ utils/
β β βββ file_parser.py # PDF/DOCX/TXT parsing
β β βββ highlighter.py # Text highlighting utilities
β βββ main.py # Flask app entry point
β βββ requirements.txt # Python dependencies
βββ frontend/
β βββ index.html # Web UI (HTML + CSS + JS)
βββ README.md # This file
Customization
Change Detector Weights
In backend/config/detectors_config.py:
DETECTOR_WEIGHTS: Dict[str, float] = {
"roberta": 0.40, # Increase weight
"perplexity": 0.30,
"llmdet": 0.20,
"hf_classifier": 0.10,
}
Change Aggregation Method
In backend/config/detectors_config.py:
AGGREGATION_METHOD = "max" # Options: weighted_average, max, voting
Use Different Models
In backend/config/detectors_config.py:
ROBERTA_MODEL = "distilbert-base-uncased"
PERPLEXITY_MODEL = "gpt2-medium"
HF_CLASSIFIER_MODEL = "your-custom-model"
Add Custom Detectors
- Create a new file in
backend/detectors/ - Inherit from
BaseDetector - Implement
detect()method - Add to
ensemble.pyinitialization - Add to
ENABLED_DETECTORSin config
Example:
from detectors.base import BaseDetector, DetectorResult
class CustomDetector(BaseDetector):
def __init__(self):
super().__init__(name="custom")
def detect(self, text: str) -> DetectorResult:
# Your detection logic here
score = calculate_ai_score(text)
return DetectorResult(
detector_name=self.name,
score=score,
explanation="Custom detection result"
)
Performance Tips
- Model Caching - Models are lazy-loaded and cached in memory
- Parallel Detection - Detectors can run in parallel (future enhancement)
- Batch Processing - Configure batch size for GPU processing
- Disable Unused Detectors - Reduce load by disabling detectors you don't need
Troubleshooting
Slow First Run
- Models need to be downloaded from Hugging Face Hub
- Subsequent runs will use cached models
- First model download can take 1-5 minutes
Out of Memory
- Reduce batch size in config
- Disable memory-intensive detectors
- Run on a machine with more RAM
Model Not Found
transformers.utils.RepositoryNotFoundError: Model not found
- Model name is incorrect in config
- Check Hugging Face Hub for correct model name
Database Locked
sqlite3.OperationalError: database is locked
- Close other connections to the database
- Ensure only one Flask instance is running
- Delete
.db-journalfile if present
Future Enhancements
- Parallel detector execution
- GPU support optimization
- Custom model fine-tuning
- Batch analysis API
- User authentication/authorization
- Document highlighting with suspicious sections
- Advanced filtering and search
- Export results to PDF/Excel
- API rate limiting
- Webhook notifications
License
MIT License - feel free to use and modify
References
Support
For issues, questions, or suggestions, please open an issue on the project repository.