File size: 1,018 Bytes
39fe5f7
 
8f9c2ab
39fe5f7
8f9c2ab
39fe5f7
8f9c2ab
 
 
 
 
 
 
 
 
 
 
 
 
39fe5f7
 
8f9c2ab
 
 
 
 
 
 
39fe5f7
 
8f9c2ab
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# DomainShield

**Privacy protection for LLM pipelines**

DomainShield is a research project focused on preventing sensitive data leakage when using external large language model APIs.

## Overview

The system acts as a middleware firewall:
- Masks sensitive information before sending data to external LLMs
- Handles both general PII and domain-specific sensitive entities
- Reconstructs the original content after receiving the response

## Key Focus

- PII masking (names, emails, identifiers)
- Domain-specific entity protection (internal terms, codes, private vocabularies)
- Multilingual robustness under noisy conditions
- Comparison of adaptation methods (prompting, RAG, fine-tuning, NER)

## Approach

We evaluate multiple strategies for detecting and masking sensitive data:
- Prompt-based methods
- Retrieval-augmented approaches (RAG)
- Supervised fine-tuning (LoRA)
- Token classification (NER)
- Hybrid and ensemble methods

## Status

Active research project. Models, benchmarks, and demos coming soon.