Spaces:
Configuration error
Configuration error
DomainShield
Privacy protection for LLM pipelines
DomainShield is a research project focused on preventing sensitive data leakage when using external large language model APIs.
Overview
The system acts as a middleware firewall:
- Masks sensitive information before sending data to external LLMs
- Handles both general PII and domain-specific sensitive entities
- Reconstructs the original content after receiving the response
Key Focus
- PII masking (names, emails, identifiers)
- Domain-specific entity protection (internal terms, codes, private vocabularies)
- Multilingual robustness under noisy conditions
- Comparison of adaptation methods (prompting, RAG, fine-tuning, NER)
Approach
We evaluate multiple strategies for detecting and masking sensitive data:
- Prompt-based methods
- Retrieval-augmented approaches (RAG)
- Supervised fine-tuning (LoRA)
- Token classification (NER)
- Hybrid and ensemble methods
Status
Active research project. Models, benchmarks, and demos coming soon.