Spaces:
Sleeping
Sleeping
| # Quick Start - Security Features | |
| ## β‘ 30-Second Setup for PII Protection | |
| ### Step 1: Enable Redaction in UI | |
| ``` | |
| β Enable PII Redaction | |
| β Redaction Level: moderate | |
| ``` | |
| ### Step 2: Configure Environment | |
| ```bash | |
| # Edit .env file | |
| DEBUG_MODE=False | |
| SANITIZE_LOGS=True | |
| ``` | |
| ### Step 3: Use Safe Data | |
| - β Synthetic data (create_sample_transcripts.py) | |
| - β De-identified data (all 18 HIPAA identifiers removed) | |
| - β Real PHI on HuggingFace Spaces | |
| That's it! π | |
| --- | |
| ## π¨ Critical Decision Tree | |
| ``` | |
| Do you have real patient/healthcare data? | |
| βββ YES β Contains ANY of these? | |
| β βββ Names, dates, SSN, MRN, emails, phones, addresses? | |
| β β βββ YES β β οΈ STOP! Cannot use HF Spaces! | |
| β β β βββ Options: | |
| β β β 1. Remove ALL 18 HIPAA identifiers (de-identify) | |
| β β β 2. Deploy on AWS/Azure/GCP with BAA | |
| β β β 3. Use synthetic data instead | |
| β β βββ NO β Proceed with redaction enabled | |
| β βββ NO β Safe to use HF Spaces | |
| βββ NO β β Safe to proceed | |
| ``` | |
| --- | |
| ## π Quick Redaction Levels Guide | |
| | Level | What's Redacted | Use When | | |
| |-------|----------------|----------| | |
| | **Minimal** | SSN, MRN, Account # | Testing, low-risk data | | |
| | **Moderate** | + Emails, Phones, Dates | **Recommended** - balanced protection | | |
| | **Strict** | + Names, Addresses | Maximum protection, compliance testing | | |
| --- | |
| ## π The 18 HIPAA Identifiers (Must Remove ALL for De-identification) | |
| 1. Names | |
| 2. Locations < State | |
| 3. Dates (except year) | |
| 4. Phone numbers | |
| 5. Fax numbers | |
| 6. Email addresses | |
| 7. SSN | |
| 8. MRN | |
| 9. Health plan # | |
| 10. Account # | |
| 11. License # | |
| 12. Vehicle IDs | |
| 13. Device serial # | |
| 14. URLs | |
| 15. IP addresses | |
| 16. Biometrics | |
| 17. Photos | |
| 18. Other unique IDs | |
| **Redaction module helps with these, but verify manually!** | |
| --- | |
| ## βοΈ Environment Variables Cheat Sheet | |
| ```bash | |
| # Security (ALWAYS set these in production) | |
| DEBUG_MODE=False # No debug output | |
| SANITIZE_LOGS=True # Redact PII from logs | |
| # Logging | |
| LOG_TO_FILE=True # Create audit trail | |
| # LLM Backend (for HIPAA: use local) | |
| USE_LMSTUDIO=True # β Keeps data local | |
| USE_HF_API=False # β Sends to HF servers | |
| # LM Studio | |
| LMSTUDIO_URL=http://localhost:1234/v1/chat/completions | |
| ``` | |
| --- | |
| ## π― Common Scenarios | |
| ### Scenario 1: Testing with Fake Data | |
| ```bash | |
| 1. python create_sample_transcripts.py --count 5 --synthetic | |
| 2. Upload to TranscriptorAI | |
| 3. Optional: Enable redaction for testing | |
| 4. β Safe - no real data | |
| ``` | |
| ### Scenario 2: De-identified Research Data | |
| ```bash | |
| 1. Remove all 18 HIPAA identifiers manually | |
| 2. Enable redaction (moderate or strict) | |
| 3. Upload to TranscriptorAI | |
| 4. Review outputs - verify no PII leaked | |
| 5. β Safe if properly de-identified | |
| ``` | |
| ### Scenario 3: Real Patient Data (HIPAA) | |
| ```bash | |
| 1. β οΈ DO NOT use HuggingFace Spaces | |
| 2. Deploy on AWS HealthLake / Azure Health / GCP | |
| 3. Sign BAA with cloud provider | |
| 4. Configure encryption, MFA, audit logs | |
| 5. Enable PII redaction (strict mode) | |
| 6. β Safe with proper infrastructure | |
| ``` | |
| --- | |
| ## π Troubleshooting | |
| **Problem:** "Redaction not working" | |
| - β Check HAS_REDACTION is True in logs | |
| - β Verify redaction.py exists | |
| - β Check "Enable PII Redaction" is checked | |
| **Problem:** "Too much debug output" | |
| - β Set DEBUG_MODE=False in .env | |
| - β Restart application | |
| **Problem:** "PII showing in logs" | |
| - β Set SANITIZE_LOGS=True in .env | |
| - β Check logger.py is imported | |
| **Problem:** "Need to use real PHI" | |
| - β Read SECURITY_AND_COMPLIANCE.md | |
| - β Deploy on compliant infrastructure | |
| - β Never use HF Spaces for real PHI | |
| --- | |
| ## π Quick Links | |
| - **Full Security Guide:** `SECURITY_AND_COMPLIANCE.md` | |
| - **What Changed:** `IMPROVEMENTS_SUMMARY.md` | |
| - **General Docs:** `README.md` | |
| - **HIPAA Guidance:** https://www.hhs.gov/hipaa | |
| --- | |
| ## β Pre-Flight Checklist | |
| Before uploading sensitive data: | |
| - [ ] Read SECURITY_AND_COMPLIANCE.md | |
| - [ ] Data is de-identified OR synthetic | |
| - [ ] PII redaction enabled in UI | |
| - [ ] DEBUG_MODE=False | |
| - [ ] SANITIZE_LOGS=True | |
| - [ ] Using local LLM (not HF API) | |
| - [ ] Tested with fake data first | |
| - [ ] Will manually review outputs | |
| **If using real PHI:** | |
| - [ ] Deployed on HIPAA infrastructure (NOT HF Spaces) | |
| - [ ] BAA signed with cloud provider | |
| - [ ] Compliance review completed | |
| --- | |
| **Remember: When in doubt, use synthetic data!** | |