TranscriptWriting / QUICK_START_SECURITY.md
jmisak's picture
Upload 57 files
52d0298 verified
# Quick Start - Security Features
## ⚑ 30-Second Setup for PII Protection
### Step 1: Enable Redaction in UI
```
β˜‘ Enable PII Redaction
β—‹ Redaction Level: moderate
```
### Step 2: Configure Environment
```bash
# Edit .env file
DEBUG_MODE=False
SANITIZE_LOGS=True
```
### Step 3: Use Safe Data
- βœ… Synthetic data (create_sample_transcripts.py)
- βœ… De-identified data (all 18 HIPAA identifiers removed)
- ❌ Real PHI on HuggingFace Spaces
That's it! πŸŽ‰
---
## 🚨 Critical Decision Tree
```
Do you have real patient/healthcare data?
β”œβ”€β”€ YES β†’ Contains ANY of these?
β”‚ β”œβ”€β”€ Names, dates, SSN, MRN, emails, phones, addresses?
β”‚ β”‚ β”œβ”€β”€ YES β†’ ⚠️ STOP! Cannot use HF Spaces!
β”‚ β”‚ β”‚ └── Options:
β”‚ β”‚ β”‚ 1. Remove ALL 18 HIPAA identifiers (de-identify)
β”‚ β”‚ β”‚ 2. Deploy on AWS/Azure/GCP with BAA
β”‚ β”‚ β”‚ 3. Use synthetic data instead
β”‚ β”‚ └── NO β†’ Proceed with redaction enabled
β”‚ └── NO β†’ Safe to use HF Spaces
└── NO β†’ βœ… Safe to proceed
```
---
## πŸ“‹ Quick Redaction Levels Guide
| Level | What's Redacted | Use When |
|-------|----------------|----------|
| **Minimal** | SSN, MRN, Account # | Testing, low-risk data |
| **Moderate** | + Emails, Phones, Dates | **Recommended** - balanced protection |
| **Strict** | + Names, Addresses | Maximum protection, compliance testing |
---
## πŸ” The 18 HIPAA Identifiers (Must Remove ALL for De-identification)
1. Names
2. Locations < State
3. Dates (except year)
4. Phone numbers
5. Fax numbers
6. Email addresses
7. SSN
8. MRN
9. Health plan #
10. Account #
11. License #
12. Vehicle IDs
13. Device serial #
14. URLs
15. IP addresses
16. Biometrics
17. Photos
18. Other unique IDs
**Redaction module helps with these, but verify manually!**
---
## βš™οΈ Environment Variables Cheat Sheet
```bash
# Security (ALWAYS set these in production)
DEBUG_MODE=False # No debug output
SANITIZE_LOGS=True # Redact PII from logs
# Logging
LOG_TO_FILE=True # Create audit trail
# LLM Backend (for HIPAA: use local)
USE_LMSTUDIO=True # βœ… Keeps data local
USE_HF_API=False # ❌ Sends to HF servers
# LM Studio
LMSTUDIO_URL=http://localhost:1234/v1/chat/completions
```
---
## 🎯 Common Scenarios
### Scenario 1: Testing with Fake Data
```bash
1. python create_sample_transcripts.py --count 5 --synthetic
2. Upload to TranscriptorAI
3. Optional: Enable redaction for testing
4. βœ… Safe - no real data
```
### Scenario 2: De-identified Research Data
```bash
1. Remove all 18 HIPAA identifiers manually
2. Enable redaction (moderate or strict)
3. Upload to TranscriptorAI
4. Review outputs - verify no PII leaked
5. βœ… Safe if properly de-identified
```
### Scenario 3: Real Patient Data (HIPAA)
```bash
1. ⚠️ DO NOT use HuggingFace Spaces
2. Deploy on AWS HealthLake / Azure Health / GCP
3. Sign BAA with cloud provider
4. Configure encryption, MFA, audit logs
5. Enable PII redaction (strict mode)
6. βœ… Safe with proper infrastructure
```
---
## πŸ†˜ Troubleshooting
**Problem:** "Redaction not working"
- βœ… Check HAS_REDACTION is True in logs
- βœ… Verify redaction.py exists
- βœ… Check "Enable PII Redaction" is checked
**Problem:** "Too much debug output"
- βœ… Set DEBUG_MODE=False in .env
- βœ… Restart application
**Problem:** "PII showing in logs"
- βœ… Set SANITIZE_LOGS=True in .env
- βœ… Check logger.py is imported
**Problem:** "Need to use real PHI"
- βœ… Read SECURITY_AND_COMPLIANCE.md
- βœ… Deploy on compliant infrastructure
- βœ… Never use HF Spaces for real PHI
---
## πŸ“ž Quick Links
- **Full Security Guide:** `SECURITY_AND_COMPLIANCE.md`
- **What Changed:** `IMPROVEMENTS_SUMMARY.md`
- **General Docs:** `README.md`
- **HIPAA Guidance:** https://www.hhs.gov/hipaa
---
## βœ… Pre-Flight Checklist
Before uploading sensitive data:
- [ ] Read SECURITY_AND_COMPLIANCE.md
- [ ] Data is de-identified OR synthetic
- [ ] PII redaction enabled in UI
- [ ] DEBUG_MODE=False
- [ ] SANITIZE_LOGS=True
- [ ] Using local LLM (not HF API)
- [ ] Tested with fake data first
- [ ] Will manually review outputs
**If using real PHI:**
- [ ] Deployed on HIPAA infrastructure (NOT HF Spaces)
- [ ] BAA signed with cloud provider
- [ ] Compliance review completed
---
**Remember: When in doubt, use synthetic data!**