Spaces:

empirenexus
/

TranscriptWriting

Sleeping

File size: 4,592 Bytes

52d0298

# Quick Start - Security Features

## ⚡ 30-Second Setup for PII Protection

### Step 1: Enable Redaction in UI
```

☑ Enable PII Redaction

○ Redaction Level: moderate

```

### Step 2: Configure Environment
```bash

# Edit .env file

DEBUG_MODE=False

SANITIZE_LOGS=True

```

### Step 3: Use Safe Data
- ✅ Synthetic data (create_sample_transcripts.py)
- ✅ De-identified data (all 18 HIPAA identifiers removed)
- ❌ Real PHI on HuggingFace Spaces

That's it! 🎉

---

## 🚨 Critical Decision Tree

```

Do you have real patient/healthcare data?

├── YES → Contains ANY of these?

│   ├── Names, dates, SSN, MRN, emails, phones, addresses?

│   │   ├── YES → ⚠️ STOP! Cannot use HF Spaces!

│   │   │   └── Options:

│   │   │       1. Remove ALL 18 HIPAA identifiers (de-identify)

│   │   │       2. Deploy on AWS/Azure/GCP with BAA

│   │   │       3. Use synthetic data instead

│   │   └── NO → Proceed with redaction enabled

│   └── NO → Safe to use HF Spaces

└── NO → ✅ Safe to proceed

```

---

## 📋 Quick Redaction Levels Guide

| Level | What's Redacted | Use When |
|-------|----------------|----------|
| **Minimal** | SSN, MRN, Account # | Testing, low-risk data |
| **Moderate** | + Emails, Phones, Dates | **Recommended** - balanced protection |
| **Strict** | + Names, Addresses | Maximum protection, compliance testing |

---

## 🔐 The 18 HIPAA Identifiers (Must Remove ALL for De-identification)

1. Names
2. Locations < State
3. Dates (except year)
4. Phone numbers
5. Fax numbers
6. Email addresses
7. SSN
8. MRN
9. Health plan #
10. Account #
11. License #
12. Vehicle IDs
13. Device serial #
14. URLs
15. IP addresses
16. Biometrics
17. Photos
18. Other unique IDs

**Redaction module helps with these, but verify manually!**

---

## ⚙️ Environment Variables Cheat Sheet

```bash

# Security (ALWAYS set these in production)

DEBUG_MODE=False              # No debug output

SANITIZE_LOGS=True           # Redact PII from logs



# Logging

LOG_TO_FILE=True             # Create audit trail



# LLM Backend (for HIPAA: use local)

USE_LMSTUDIO=True            # ✅ Keeps data local

USE_HF_API=False             # ❌ Sends to HF servers



# LM Studio

LMSTUDIO_URL=http://localhost:1234/v1/chat/completions

```

---

## 🎯 Common Scenarios

### Scenario 1: Testing with Fake Data
```bash

1. python create_sample_transcripts.py --count 5 --synthetic

2. Upload to TranscriptorAI

3. Optional: Enable redaction for testing

4. ✅ Safe - no real data

```

### Scenario 2: De-identified Research Data
```bash

1. Remove all 18 HIPAA identifiers manually

2. Enable redaction (moderate or strict)

3. Upload to TranscriptorAI

4. Review outputs - verify no PII leaked

5. ✅ Safe if properly de-identified

```

### Scenario 3: Real Patient Data (HIPAA)
```bash

1. ⚠️ DO NOT use HuggingFace Spaces

2. Deploy on AWS HealthLake / Azure Health / GCP

3. Sign BAA with cloud provider

4. Configure encryption, MFA, audit logs

5. Enable PII redaction (strict mode)

6. ✅ Safe with proper infrastructure

```

---

## 🆘 Troubleshooting

**Problem:** "Redaction not working"
- ✅ Check HAS_REDACTION is True in logs

- ✅ Verify redaction.py exists

- ✅ Check "Enable PII Redaction" is checked



**Problem:** "Too much debug output"

- ✅ Set DEBUG_MODE=False in .env
- ✅ Restart application

**Problem:** "PII showing in logs"
- ✅ Set SANITIZE_LOGS=True in .env

- ✅ Check logger.py is imported



**Problem:** "Need to use real PHI"

- ✅ Read SECURITY_AND_COMPLIANCE.md

- ✅ Deploy on compliant infrastructure

- ✅ Never use HF Spaces for real PHI



---



## 📞 Quick Links



- **Full Security Guide:** `SECURITY_AND_COMPLIANCE.md`

- **What Changed:** `IMPROVEMENTS_SUMMARY.md`
- **General Docs:** `README.md`
- **HIPAA Guidance:** https://www.hhs.gov/hipaa

---

## ✅ Pre-Flight Checklist

Before uploading sensitive data:

- [ ] Read SECURITY_AND_COMPLIANCE.md
- [ ] Data is de-identified OR synthetic
- [ ] PII redaction enabled in UI
- [ ] DEBUG_MODE=False

- [ ] SANITIZE_LOGS=True
- [ ] Using local LLM (not HF API)
- [ ] Tested with fake data first
- [ ] Will manually review outputs

**If using real PHI:**
- [ ] Deployed on HIPAA infrastructure (NOT HF Spaces)
- [ ] BAA signed with cloud provider
- [ ] Compliance review completed

---

**Remember: When in doubt, use synthetic data!**