# DomainShield

**Privacy protection for LLM pipelines**

DomainShield is a research project focused on preventing sensitive data leakage when using external large language model APIs.

## Overview

The system acts as a middleware firewall:
- Masks sensitive information before sending data to external LLMs
- Handles both general PII and domain-specific sensitive entities
- Reconstructs the original content after receiving the response

## Key Focus

- PII masking (names, emails, identifiers)
- Domain-specific entity protection (internal terms, codes, private vocabularies)
- Multilingual robustness under noisy conditions
- Comparison of adaptation methods (prompting, RAG, fine-tuning, NER)

## Approach

We evaluate multiple strategies for detecting and masking sensitive data:
- Prompt-based methods
- Retrieval-augmented approaches (RAG)
- Supervised fine-tuning (LoRA)
- Token classification (NER)
- Hybrid and ensemble methods

## Status

Active research project. Models, benchmarks, and demos coming soon.