|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: deepseek-ai/deepseek-11m-7b-base |
|
|
tags: |
|
|
- phi-deidentification |
|
|
- healthcare-nlp |
|
|
- medical-text |
|
|
- lora |
|
|
- peft |
|
|
- privacy |
|
|
- ner |
|
|
- safety |
|
|
--- |
|
|
|
|
|
# DeepSeek PHI De-identification Adapter |
|
|
|
|
|
This repository hosts a LoRA adapter fine-tuned for safe detection and redaction of |
|
|
Protected Health Information (PHI) in clinical text. |
|
|
|
|
|
The model is trained on a large synthetic and de-identified corpus derived from |
|
|
MIMIC-III-style clinical notes and is designed to operate as part of a configurable, |
|
|
explainable medical text de-identification pipeline. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Developed by:** Iftakhar Khandokar (Marquette University) |
|
|
- **Funded by:** Academic research (EECE Department, Marquette University) |
|
|
- **Shared by:** Iftakhar Khandokar |
|
|
- **Model type:** LoRA adapter (PEFT) |
|
|
- **Base model:** `deepseek-ai/deepseek-11m-7b-base` |
|
|
- **Language:** English (clinical / biomedical NLP) |
|
|
- **License:** Apache 2.0 |
|
|
|
|
|
--- |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
This adapter is intended for: |
|
|
|
|
|
✅ Research on medical data de-identification |
|
|
✅ Benchmarking privacy-preserving NLP pipelines |
|
|
✅ Safety and explainability evaluation for clinical LLM workflows |
|
|
|
|
|
--- |
|
|
|
|
|
## Not Intended For |
|
|
|
|
|
❌ Automated medical diagnosis |
|
|
❌ Direct patient care deployment without regulatory review |
|
|
❌ Generating synthetic patient records for real-world use |
|
|
|
|
|
--- |
|
|
|
|
|
## Loading the Model |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM |
|
|
from peft import PeftModel |
|
|
|
|
|
base = AutoModelForCausalLM.from_pretrained( |
|
|
"deepseek-ai/deepseek-11m-7b-base", trust_remote_code=True |
|
|
) |
|
|
|
|
|
model = PeftModel.from_pretrained( |
|
|
base, |
|
|
"Iftakhar/deepseek-phi-adapter" |
|
|
) |
|
|
|