Spaces:
Sleeping
Sleeping
File size: 4,592 Bytes
52d0298 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 |
# Quick Start - Security Features
## β‘ 30-Second Setup for PII Protection
### Step 1: Enable Redaction in UI
```
β Enable PII Redaction
β Redaction Level: moderate
```
### Step 2: Configure Environment
```bash
# Edit .env file
DEBUG_MODE=False
SANITIZE_LOGS=True
```
### Step 3: Use Safe Data
- β
Synthetic data (create_sample_transcripts.py)
- β
De-identified data (all 18 HIPAA identifiers removed)
- β Real PHI on HuggingFace Spaces
That's it! π
---
## π¨ Critical Decision Tree
```
Do you have real patient/healthcare data?
βββ YES β Contains ANY of these?
β βββ Names, dates, SSN, MRN, emails, phones, addresses?
β β βββ YES β β οΈ STOP! Cannot use HF Spaces!
β β β βββ Options:
β β β 1. Remove ALL 18 HIPAA identifiers (de-identify)
β β β 2. Deploy on AWS/Azure/GCP with BAA
β β β 3. Use synthetic data instead
β β βββ NO β Proceed with redaction enabled
β βββ NO β Safe to use HF Spaces
βββ NO β β
Safe to proceed
```
---
## π Quick Redaction Levels Guide
| Level | What's Redacted | Use When |
|-------|----------------|----------|
| **Minimal** | SSN, MRN, Account # | Testing, low-risk data |
| **Moderate** | + Emails, Phones, Dates | **Recommended** - balanced protection |
| **Strict** | + Names, Addresses | Maximum protection, compliance testing |
---
## π The 18 HIPAA Identifiers (Must Remove ALL for De-identification)
1. Names
2. Locations < State
3. Dates (except year)
4. Phone numbers
5. Fax numbers
6. Email addresses
7. SSN
8. MRN
9. Health plan #
10. Account #
11. License #
12. Vehicle IDs
13. Device serial #
14. URLs
15. IP addresses
16. Biometrics
17. Photos
18. Other unique IDs
**Redaction module helps with these, but verify manually!**
---
## βοΈ Environment Variables Cheat Sheet
```bash
# Security (ALWAYS set these in production)
DEBUG_MODE=False # No debug output
SANITIZE_LOGS=True # Redact PII from logs
# Logging
LOG_TO_FILE=True # Create audit trail
# LLM Backend (for HIPAA: use local)
USE_LMSTUDIO=True # β
Keeps data local
USE_HF_API=False # β Sends to HF servers
# LM Studio
LMSTUDIO_URL=http://localhost:1234/v1/chat/completions
```
---
## π― Common Scenarios
### Scenario 1: Testing with Fake Data
```bash
1. python create_sample_transcripts.py --count 5 --synthetic
2. Upload to TranscriptorAI
3. Optional: Enable redaction for testing
4. β
Safe - no real data
```
### Scenario 2: De-identified Research Data
```bash
1. Remove all 18 HIPAA identifiers manually
2. Enable redaction (moderate or strict)
3. Upload to TranscriptorAI
4. Review outputs - verify no PII leaked
5. β
Safe if properly de-identified
```
### Scenario 3: Real Patient Data (HIPAA)
```bash
1. β οΈ DO NOT use HuggingFace Spaces
2. Deploy on AWS HealthLake / Azure Health / GCP
3. Sign BAA with cloud provider
4. Configure encryption, MFA, audit logs
5. Enable PII redaction (strict mode)
6. β
Safe with proper infrastructure
```
---
## π Troubleshooting
**Problem:** "Redaction not working"
- β
Check HAS_REDACTION is True in logs
- β
Verify redaction.py exists
- β
Check "Enable PII Redaction" is checked
**Problem:** "Too much debug output"
- β
Set DEBUG_MODE=False in .env
- β
Restart application
**Problem:** "PII showing in logs"
- β
Set SANITIZE_LOGS=True in .env
- β
Check logger.py is imported
**Problem:** "Need to use real PHI"
- β
Read SECURITY_AND_COMPLIANCE.md
- β
Deploy on compliant infrastructure
- β
Never use HF Spaces for real PHI
---
## π Quick Links
- **Full Security Guide:** `SECURITY_AND_COMPLIANCE.md`
- **What Changed:** `IMPROVEMENTS_SUMMARY.md`
- **General Docs:** `README.md`
- **HIPAA Guidance:** https://www.hhs.gov/hipaa
---
## β
Pre-Flight Checklist
Before uploading sensitive data:
- [ ] Read SECURITY_AND_COMPLIANCE.md
- [ ] Data is de-identified OR synthetic
- [ ] PII redaction enabled in UI
- [ ] DEBUG_MODE=False
- [ ] SANITIZE_LOGS=True
- [ ] Using local LLM (not HF API)
- [ ] Tested with fake data first
- [ ] Will manually review outputs
**If using real PHI:**
- [ ] Deployed on HIPAA infrastructure (NOT HF Spaces)
- [ ] BAA signed with cloud provider
- [ ] Compliance review completed
---
**Remember: When in doubt, use synthetic data!**
|