# Quick Start - Security Features

## ⚡ 30-Second Setup for PII Protection

### Step 1: Enable Redaction in UI
```
☑ Enable PII Redaction
○ Redaction Level: moderate
```

### Step 2: Configure Environment
```bash
# Edit .env file
DEBUG_MODE=False
SANITIZE_LOGS=True
```

### Step 3: Use Safe Data
- ✅ Synthetic data (create_sample_transcripts.py)
- ✅ De-identified data (all 18 HIPAA identifiers removed)
- ❌ Real PHI on HuggingFace Spaces

That's it! 🎉

---

## 🚨 Critical Decision Tree

```
Do you have real patient/healthcare data?
├── YES → Contains ANY of these?
│   ├── Names, dates, SSN, MRN, emails, phones, addresses?
│   │   ├── YES → ⚠️ STOP! Cannot use HF Spaces!
│   │   │   └── Options:
│   │   │       1. Remove ALL 18 HIPAA identifiers (de-identify)
│   │   │       2. Deploy on AWS/Azure/GCP with BAA
│   │   │       3. Use synthetic data instead
│   │   └── NO → Proceed with redaction enabled
│   └── NO → Safe to use HF Spaces
└── NO → ✅ Safe to proceed
```

---

## 📋 Quick Redaction Levels Guide

| Level | What's Redacted | Use When |
|-------|----------------|----------|
| **Minimal** | SSN, MRN, Account # | Testing, low-risk data |
| **Moderate** | + Emails, Phones, Dates | **Recommended** - balanced protection |
| **Strict** | + Names, Addresses | Maximum protection, compliance testing |

---

## 🔐 The 18 HIPAA Identifiers (Must Remove ALL for De-identification)

1. Names
2. Locations < State
3. Dates (except year)
4. Phone numbers
5. Fax numbers
6. Email addresses
7. SSN
8. MRN
9. Health plan #
10. Account #
11. License #
12. Vehicle IDs
13. Device serial #
14. URLs
15. IP addresses
16. Biometrics
17. Photos
18. Other unique IDs

**Redaction module helps with these, but verify manually!**

---

## ⚙️ Environment Variables Cheat Sheet

```bash
# Security (ALWAYS set these in production)
DEBUG_MODE=False              # No debug output
SANITIZE_LOGS=True           # Redact PII from logs

# Logging
LOG_TO_FILE=True             # Create audit trail

# LLM Backend (for HIPAA: use local)
USE_LMSTUDIO=True            # ✅ Keeps data local
USE_HF_API=False             # ❌ Sends to HF servers

# LM Studio
LMSTUDIO_URL=http://localhost:1234/v1/chat/completions
```

---

## 🎯 Common Scenarios

### Scenario 1: Testing with Fake Data
```bash
1. python create_sample_transcripts.py --count 5 --synthetic
2. Upload to TranscriptorAI
3. Optional: Enable redaction for testing
4. ✅ Safe - no real data
```

### Scenario 2: De-identified Research Data
```bash
1. Remove all 18 HIPAA identifiers manually
2. Enable redaction (moderate or strict)
3. Upload to TranscriptorAI
4. Review outputs - verify no PII leaked
5. ✅ Safe if properly de-identified
```

### Scenario 3: Real Patient Data (HIPAA)
```bash
1. ⚠️ DO NOT use HuggingFace Spaces
2. Deploy on AWS HealthLake / Azure Health / GCP
3. Sign BAA with cloud provider
4. Configure encryption, MFA, audit logs
5. Enable PII redaction (strict mode)
6. ✅ Safe with proper infrastructure
```

---

## 🆘 Troubleshooting

**Problem:** "Redaction not working"
- ✅ Check HAS_REDACTION is True in logs
- ✅ Verify redaction.py exists
- ✅ Check "Enable PII Redaction" is checked

**Problem:** "Too much debug output"
- ✅ Set DEBUG_MODE=False in .env
- ✅ Restart application

**Problem:** "PII showing in logs"
- ✅ Set SANITIZE_LOGS=True in .env
- ✅ Check logger.py is imported

**Problem:** "Need to use real PHI"
- ✅ Read SECURITY_AND_COMPLIANCE.md
- ✅ Deploy on compliant infrastructure
- ✅ Never use HF Spaces for real PHI

---

## 📞 Quick Links

- **Full Security Guide:** `SECURITY_AND_COMPLIANCE.md`
- **What Changed:** `IMPROVEMENTS_SUMMARY.md`
- **General Docs:** `README.md`
- **HIPAA Guidance:** https://www.hhs.gov/hipaa

---

## ✅ Pre-Flight Checklist

Before uploading sensitive data:

- [ ] Read SECURITY_AND_COMPLIANCE.md
- [ ] Data is de-identified OR synthetic
- [ ] PII redaction enabled in UI
- [ ] DEBUG_MODE=False
- [ ] SANITIZE_LOGS=True
- [ ] Using local LLM (not HF API)
- [ ] Tested with fake data first
- [ ] Will manually review outputs

**If using real PHI:**
- [ ] Deployed on HIPAA infrastructure (NOT HF Spaces)
- [ ] BAA signed with cloud provider
- [ ] Compliance review completed

---

**Remember: When in doubt, use synthetic data!**