LLM-PII-Detection-Leaderboard

Running

App Files Files Community

LLM-PII-Detection-Leaderboard / README.md

Luis Kalckstein

V1 including mock results

32e8dbc unverified 6 months ago

preview code

raw

history blame

3.93 kB

metadata

title: LLM PII Detection Leaderboard
emoji: 🥇
colorFrom: green
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: true
license: apache-2.0
short_description: Duplicate this leaderboard to initialize your own!
sdk_version: 5.19.0

🔒 LLM PII Detection Leaderboard

A comprehensive benchmark for evaluating language models' performance in detecting and handling personally identifiable information (PII) across various document types and scenarios.

✨ Features

Beautiful Modern UI: Elegant dark theme with gradient styling and smooth animations
Comprehensive Metrics: Precision, Recall, F1 Score, Over-detection Rate, Processing Time, and Cost
Domain-Specific Analysis: Specialized evaluation across Healthcare, Financial, Government, Legal, and Personal documents
Performance Cards: Professional model performance cards perfect for presentations and reports
Interactive Filtering: Filter by model type, document type, and sort by any metric
Real-time Updates: Dynamic table updates and score visualizations

🚀 Quick Start

Installation

git clone https://github.com/your-username/LLM-PII-Detection-Leaderboard.git
cd LLM-PII-Detection-Leaderboard
pip install -r requirements.txt

Run the Application

python app.py

The leaderboard will be available at http://localhost:7860

📊 Key Metrics

Overall Accuracy: Percentage of correctly identified and classified PII entities
Precision: Of all flagged items, how many were actually PII (avoiding false positives)
Recall: Of all PII present, how many were successfully detected (avoiding false negatives)
F1 Score: Harmonic mean balancing precision and recall
Over-detection Rate: Percentage of non-PII incorrectly flagged (lower is better)

🏗️ Project Structure

LLM-PII-Detection-Leaderboard/
├── app.py                 # Main application entry point
├── pii_leaderboard.py     # Core leaderboard functionality
├── data_loader.py         # Data loading and styling configuration
├── requirements.txt       # Python dependencies
└── README.md             # This file

🎨 Design Philosophy

This leaderboard combines the slim architecture of agent-leaderboard with the beautiful design elements from DocumentProcessing Leaderboard Nutrient, featuring:

Minimal Dependencies: Only essential packages (Gradio, Pandas, NumPy)
Clean Architecture: Simple, maintainable code structure
Professional Styling: Modern dark theme with custom color palette
Interactive Elements: Score bars, rank badges, and performance cards
Responsive Design: Works beautifully on all screen sizes

🔧 Customization

Adding New Models

Update the sample_data dictionary in data_loader.py with your model's performance metrics.

Changing Colors

Modify the COLORS dictionary in data_loader.py to customize the color scheme.

Adding New Metrics

Add the metric to your data structure
Update the table generation in pii_leaderboard.py
Add appropriate styling and score bars

📈 Performance

The leaderboard currently evaluates 8 leading language models across:

5 Document Types: Healthcare, Financial, Government, Legal, Personal
6 Key Metrics: Accuracy, Precision, Recall, F1, Over-detection Rate, Cost & Time
Real-world Scenarios: Synthetic industry documents with embedded PII

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Inspired by the elegant design of DocumentProcessing Leaderboard Nutrient
Built with the slim architecture approach of agent-leaderboard
Powered by Gradio for the beautiful web interface