Reproducible datasets for privacy-preserving natural language processing, including PII detection benchmarks and ablation study resources.