LLM-PII-Detection-Leaderboard

Sleeping

App Files Files Community

LLM-PII-Detection-Leaderboard / README.md

Luis Kalckstein

V1 including mock results

32e8dbc unverified 6 months ago

preview code

raw

history blame

3.93 kB

	---
	title: LLM PII Detection Leaderboard
	emoji: 🥇
	colorFrom: green
	colorTo: indigo
	sdk: gradio
	app_file: app.py
	pinned: true
	license: apache-2.0
	short_description: Duplicate this leaderboard to initialize your own!
	sdk_version: 5.19.0
	---

	# 🔒 LLM PII Detection Leaderboard

	A comprehensive benchmark for evaluating language models' performance in detecting and handling personally identifiable information (PII) across various document types and scenarios.

	## ✨ Features

	- Beautiful Modern UI: Elegant dark theme with gradient styling and smooth animations
	- Comprehensive Metrics: Precision, Recall, F1 Score, Over-detection Rate, Processing Time, and Cost
	- Domain-Specific Analysis: Specialized evaluation across Healthcare, Financial, Government, Legal, and Personal documents
	- Performance Cards: Professional model performance cards perfect for presentations and reports
	- Interactive Filtering: Filter by model type, document type, and sort by any metric
	- Real-time Updates: Dynamic table updates and score visualizations

	## 🚀 Quick Start

	### Installation

	```bash
	git clone https://github.com/your-username/LLM-PII-Detection-Leaderboard.git
	cd LLM-PII-Detection-Leaderboard
	pip install -r requirements.txt
	```

	### Run the Application

	```bash
	python app.py
	```

	The leaderboard will be available at `http://localhost:7860`

	## 📊 Key Metrics

	- Overall Accuracy: Percentage of correctly identified and classified PII entities
	- Precision: Of all flagged items, how many were actually PII (avoiding false positives)
	- Recall: Of all PII present, how many were successfully detected (avoiding false negatives)
	- F1 Score: Harmonic mean balancing precision and recall
	- Over-detection Rate: Percentage of non-PII incorrectly flagged (lower is better)

	## 🏗️ Project Structure

	```
	LLM-PII-Detection-Leaderboard/
	├── app.py # Main application entry point
	├── pii_leaderboard.py # Core leaderboard functionality
	├── data_loader.py # Data loading and styling configuration
	├── requirements.txt # Python dependencies
	└── README.md # This file
	```

	## 🎨 Design Philosophy

	This leaderboard combines the slim architecture of agent-leaderboard with the beautiful design elements from DocumentProcessing Leaderboard Nutrient, featuring:

	- Minimal Dependencies: Only essential packages (Gradio, Pandas, NumPy)
	- Clean Architecture: Simple, maintainable code structure
	- Professional Styling: Modern dark theme with custom color palette
	- Interactive Elements: Score bars, rank badges, and performance cards
	- Responsive Design: Works beautifully on all screen sizes

	## 🔧 Customization

	### Adding New Models

	Update the `sample_data` dictionary in `data_loader.py` with your model's performance metrics.

	### Changing Colors

	Modify the `COLORS` dictionary in `data_loader.py` to customize the color scheme.

	### Adding New Metrics

	1. Add the metric to your data structure
	2. Update the table generation in `pii_leaderboard.py`
	3. Add appropriate styling and score bars

	## 📈 Performance

	The leaderboard currently evaluates 8 leading language models across:
	- 5 Document Types: Healthcare, Financial, Government, Legal, Personal
	- 6 Key Metrics: Accuracy, Precision, Recall, F1, Over-detection Rate, Cost & Time
	- Real-world Scenarios: Synthetic industry documents with embedded PII

	## 🤝 Contributing

	1. Fork the repository
	2. Create a feature branch
	3. Make your changes
	4. Test thoroughly
	5. Submit a pull request

	## 📄 License

	This project is licensed under the MIT License - see the LICENSE file for details.

	## 🙏 Acknowledgments

	- Inspired by the elegant design of DocumentProcessing Leaderboard Nutrient
	- Built with the slim architecture approach of agent-leaderboard
	- Powered by Gradio for the beautiful web interface