metadata
title: LLM PII Detection Leaderboard
emoji: π₯
colorFrom: green
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: true
license: apache-2.0
short_description: Duplicate this leaderboard to initialize your own!
sdk_version: 5.19.0
π LLM PII Detection Leaderboard
A comprehensive benchmark for evaluating language models' performance in detecting and handling personally identifiable information (PII) across various document types and scenarios.
β¨ Features
- Beautiful Modern UI: Elegant dark theme with gradient styling and smooth animations
- Comprehensive Metrics: Precision, Recall, F1 Score, Over-detection Rate, Processing Time, and Cost
- Domain-Specific Analysis: Specialized evaluation across Healthcare, Financial, Government, Legal, and Personal documents
- Performance Cards: Professional model performance cards perfect for presentations and reports
- Interactive Filtering: Filter by model type, document type, and sort by any metric
- Real-time Updates: Dynamic table updates and score visualizations
π Quick Start
Installation
git clone https://github.com/your-username/LLM-PII-Detection-Leaderboard.git
cd LLM-PII-Detection-Leaderboard
pip install -r requirements.txt
Run the Application
python app.py
The leaderboard will be available at http://localhost:7860
π Key Metrics
- Overall Accuracy: Percentage of correctly identified and classified PII entities
- Precision: Of all flagged items, how many were actually PII (avoiding false positives)
- Recall: Of all PII present, how many were successfully detected (avoiding false negatives)
- F1 Score: Harmonic mean balancing precision and recall
- Over-detection Rate: Percentage of non-PII incorrectly flagged (lower is better)
ποΈ Project Structure
LLM-PII-Detection-Leaderboard/
βββ app.py # Main application entry point
βββ pii_leaderboard.py # Core leaderboard functionality
βββ data_loader.py # Data loading and styling configuration
βββ requirements.txt # Python dependencies
βββ README.md # This file
π¨ Design Philosophy
This leaderboard combines the slim architecture of agent-leaderboard with the beautiful design elements from DocumentProcessing Leaderboard Nutrient, featuring:
- Minimal Dependencies: Only essential packages (Gradio, Pandas, NumPy)
- Clean Architecture: Simple, maintainable code structure
- Professional Styling: Modern dark theme with custom color palette
- Interactive Elements: Score bars, rank badges, and performance cards
- Responsive Design: Works beautifully on all screen sizes
π§ Customization
Adding New Models
Update the sample_data dictionary in data_loader.py with your model's performance metrics.
Changing Colors
Modify the COLORS dictionary in data_loader.py to customize the color scheme.
Adding New Metrics
- Add the metric to your data structure
- Update the table generation in
pii_leaderboard.py - Add appropriate styling and score bars
π Performance
The leaderboard currently evaluates 8 leading language models across:
- 5 Document Types: Healthcare, Financial, Government, Legal, Personal
- 6 Key Metrics: Accuracy, Precision, Recall, F1, Over-detection Rate, Cost & Time
- Real-world Scenarios: Synthetic industry documents with embedded PII
π€ Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Acknowledgments
- Inspired by the elegant design of DocumentProcessing Leaderboard Nutrient
- Built with the slim architecture approach of agent-leaderboard
- Powered by Gradio for the beautiful web interface