--- title: Clinical Scribe emoji: 🌍 colorFrom: blue colorTo: indigo sdk: gradio sdk_version: 6.11.0 app_file: app.py pinned: false --- # OpenScribe: AI Clinical Documentation **OpenScribe** is an educational demonstration of an AI-powered clinical scribe that converts doctor-patient conversations into structured SOAP (Subjective, Objective, Assessment, Plan) notes. > **⚠️ Disclaimer:** Not intended for real clinical use. --- ## Features | Component | Implementation | |-----------|----------------| | **Speech-to-Text** | AssemblyAI Universal-2 (100 hrs/month free tier) | | **Clinical NLP** | Rule-based entity extraction (keyword + pattern matching) | | **Output Format** | Structured SOAP Note | | **Interface** | Gradio web UI with microphone & file upload support | | **Fallback Mode** | Demo transcript when API key not configured | --- ## Live Demo Try it on Hugging Face Spaces: **[OpenScribe Demo](https://huggingface.co/spaces/arafatanam/OpenScribe)** --- ## How It Works ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Audio Input β”‚ ──▢ β”‚ AssemblyAI STT β”‚ ──▢ β”‚ Transcript β”‚ β”‚ (Upload/Mic) β”‚ β”‚ (Universal-2) β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ SOAP Note β”‚ ◀── β”‚ Rule-Based NLP β”‚ ◀── β”‚ Entity Extractβ”‚ β”‚ (Output) β”‚ β”‚ (Keyword Match)β”‚ β”‚ Symptoms/Dx β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ### Pipeline Steps: 1. **Upload/Record Audio** – Supports MP3, WAV, M4A formats 2. **Transcription** – AssemblyAI processes audio and returns text 3. **Entity Extraction** – Rule-based NLP identifies: - Symptoms (cough, fever, fatigue, wheezing, etc.) - Duration and aggravating factors - Physical exam findings 4. **Diagnosis Mapping** – Keyword patterns map to likely diagnoses 5. **Treatment Plan** – Generates evidence-based recommendations 6. **SOAP Note Output** – Structured clinical documentation --- ## Installation ### Local Development ```bash # Clone the repository git clone https://huggingface.co/spaces/arafatanam/OpenScribe cd OpenScribe # Install dependencies pip install -r requirements.txt # Run the app python app.py ``` ### Hugging Face Spaces Deployment 1. Create a new Space at [huggingface.co/new-space](https://huggingface.co/new-space) 2. Choose **Gradio** as the SDK 3. Upload `app.py` and `requirements.txt` 4. Add your AssemblyAI API key to **Settings β†’ Secrets**: - Name: `ASSEMBLYAI_API_KEY` - Value: `your_api_key_here` 5. Restart the Space --- ## API Configuration ### AssemblyAI (Required for Live Transcription) 1. Sign up for free at [assemblyai.com](https://www.assemblyai.com) 2. Get your API key from the dashboard 3. Add to Hugging Face Secrets as `ASSEMBLYAI_API_KEY` **Without an API key:** The app runs in demo mode using a sample transcript. --- ## Production Comparison | Component | OpenScribe Demo | Viscrow Health Production | |-----------|-----------------|---------------------------| | Speech-to-Text | AssemblyAI Universal-2 | Azure Speech Services / Whisper | | Summarization | Rule-Based NLP | Fine-tuned Llama 3 8B | | Output Format | SOAP Note | SOAP Note + ICD-10 Billing Codes | | Accuracy | ~85% (rule-based) | 94% (LLM) | | Error Handling | Multi-tier fallback | Validation pipeline | --- ## Example Output ### Input Transcript: ``` Doctor: Hello, what brings you in today? Patient: I've had a cough for about two weeks. It gets worse at night. Doctor: Any fever? Patient: No fever, but I get winded climbing stairs. Doctor: Let me listen... I hear some mild wheezing. ``` ### Generated SOAP Note: ``` SUBJECTIVE: Chief Complaint: Cough (2 weeks duration) Associated Symptoms: Fatigue, Dyspnea on exertion, Nocturnal cough Duration: 2 weeks Aggravating Factors: Nighttime, exertion OBJECTIVE: Physical Exam: Mild expiratory wheezing on auscultation Vital Signs: Temperature 98.6Β°F, HR 72, BP 118/76, RR 16, SpO2 97% ASSESSMENT: Primary Diagnosis: Acute Bronchitis with Reactive Airway Disease Clinical Confidence: Moderate PLAN: - Albuterol HFA 90mcg, 2 puffs q4-6h PRN for wheezing - Supportive care (acute bronchitis typically viral) - Rest and increased fluid intake - Follow up in 7 days if symptoms persist ``` --- ## Project Structure ``` OpenScribe/ β”œβ”€β”€ app.py # Main application β”œβ”€β”€ requirements.txt # Python dependencies └── README.md # This file ``` --- ## Technical Implementation Notes ### Speech-to-Text Module - Chunked upload (5MB) for large files - Polling with 30-second timeout - Graceful error handling for API failures ### Rule-Based NLP Module - **Symptom Extraction:** 10+ keyword patterns - **Diagnosis Mapping:** Hierarchical rule matching - **Plan Generation:** Condition-specific recommendations - **Fallback Logic:** Default values for missing information ### Why Rule-Based Instead of LLM? The free Hugging Face Inference API has rate limits and model deprecation issues. The rule-based approach: - Works 100% of the time without API dependencies - Demonstrates core NLP fundamentals - Shows the logic that would be fine-tuned into an LLM --- ## Acknowledgments - **AssemblyAI** for free speech-to-text API tier - **Hugging Face** for free Spaces hosting - **Viscrow Health** for the production architecture inspiration --- *Built as an educational portfolio project demonstrating AI/ML engineering skills in healthcare automation.*