# DomainShield **Privacy protection for LLM pipelines** DomainShield is a research project focused on preventing sensitive data leakage when using external large language model APIs. ## Overview The system acts as a middleware firewall: - Masks sensitive information before sending data to external LLMs - Handles both general PII and domain-specific sensitive entities - Reconstructs the original content after receiving the response ## Key Focus - PII masking (names, emails, identifiers) - Domain-specific entity protection (internal terms, codes, private vocabularies) - Multilingual robustness under noisy conditions - Comparison of adaptation methods (prompting, RAG, fine-tuning, NER) ## Approach We evaluate multiple strategies for detecting and masking sensitive data: - Prompt-based methods - Retrieval-augmented approaches (RAG) - Supervised fine-tuning (LoRA) - Token classification (NER) - Hybrid and ensemble methods ## Status Active research project. Models, benchmarks, and demos coming soon.