--- title: LLM PII Detection Leaderboard emoji: 🥇 colorFrom: green colorTo: indigo sdk: gradio app_file: app.py pinned: true license: apache-2.0 short_description: Duplicate this leaderboard to initialize your own! sdk_version: 5.19.0 --- # 🔒 LLM PII Detection Leaderboard A comprehensive benchmark for evaluating language models' performance in detecting and handling personally identifiable information (PII) across various document types and scenarios. ## ✨ Features - **Beautiful Modern UI**: Elegant dark theme with gradient styling and smooth animations - **Comprehensive Metrics**: Precision, Recall, F1 Score, Over-detection Rate, Processing Time, and Cost - **Domain-Specific Analysis**: Specialized evaluation across Healthcare, Financial, Government, Legal, and Personal documents - **Performance Cards**: Professional model performance cards perfect for presentations and reports - **Interactive Filtering**: Filter by model type, document type, and sort by any metric - **Real-time Updates**: Dynamic table updates and score visualizations ## 🚀 Quick Start ### Installation ```bash git clone https://github.com/your-username/LLM-PII-Detection-Leaderboard.git cd LLM-PII-Detection-Leaderboard pip install -r requirements.txt ``` ### Run the Application ```bash python app.py ``` The leaderboard will be available at `http://localhost:7860` ## 📊 Key Metrics - **Overall Accuracy**: Percentage of correctly identified and classified PII entities - **Precision**: Of all flagged items, how many were actually PII (avoiding false positives) - **Recall**: Of all PII present, how many were successfully detected (avoiding false negatives) - **F1 Score**: Harmonic mean balancing precision and recall - **Over-detection Rate**: Percentage of non-PII incorrectly flagged (lower is better) ## 🏗️ Project Structure ``` LLM-PII-Detection-Leaderboard/ ├── app.py # Main application entry point ├── pii_leaderboard.py # Core leaderboard functionality ├── data_loader.py # Data loading and styling configuration ├── requirements.txt # Python dependencies └── README.md # This file ``` ## 🎨 Design Philosophy This leaderboard combines the slim architecture of agent-leaderboard with the beautiful design elements from DocumentProcessing Leaderboard Nutrient, featuring: - **Minimal Dependencies**: Only essential packages (Gradio, Pandas, NumPy) - **Clean Architecture**: Simple, maintainable code structure - **Professional Styling**: Modern dark theme with custom color palette - **Interactive Elements**: Score bars, rank badges, and performance cards - **Responsive Design**: Works beautifully on all screen sizes ## 🔧 Customization ### Adding New Models Update the `sample_data` dictionary in `data_loader.py` with your model's performance metrics. ### Changing Colors Modify the `COLORS` dictionary in `data_loader.py` to customize the color scheme. ### Adding New Metrics 1. Add the metric to your data structure 2. Update the table generation in `pii_leaderboard.py` 3. Add appropriate styling and score bars ## 📈 Performance The leaderboard currently evaluates 8 leading language models across: - **5 Document Types**: Healthcare, Financial, Government, Legal, Personal - **6 Key Metrics**: Accuracy, Precision, Recall, F1, Over-detection Rate, Cost & Time - **Real-world Scenarios**: Synthetic industry documents with embedded PII ## 🤝 Contributing 1. Fork the repository 2. Create a feature branch 3. Make your changes 4. Test thoroughly 5. Submit a pull request ## 📄 License This project is licensed under the MIT License - see the LICENSE file for details. ## 🙏 Acknowledgments - Inspired by the elegant design of DocumentProcessing Leaderboard Nutrient - Built with the slim architecture approach of agent-leaderboard - Powered by Gradio for the beautiful web interface