Spaces:

taraky
/

Medical_Document_Retrieval

Sleeping

App Files Files Community

tarakjc2c commited on 6 days ago

Commit

c28400a

1 Parent(s): d2aea0d

Fix README metadata

Browse files

Files changed (1) hide show

README.md +14 -81

README.md CHANGED Viewed

@@ -1,88 +1,21 @@
 ---
 emoji:
 sdk: gradio
 sdk_version: 6.0.2
 app_file: app_retrieval_cached.py
 pinned: false
 ---
-title: Medical_Document_Retrieval
-app_file: app_retrieval_cached.py
-sdk: gradio
-sdk_version: 6.0.2
----
-# Health Query Classifier & Research Retriever
-## Team Members
-*   **David Gray**
-*   **Tarak Jha**
-*   **Sravani Segireddy**
-*   **Riley Millikan**
-*   **Kent R. Spillner**
-## Project Description
-This project is a classifier that triages patient queries. If a query is identified as medical, the system retrieves relevant research and presents it to the user.
-## Workflow
-The system operates in two main stages to optimize patient care and provider efficiency:
-1.  **Classification (Triage)**:
-    The tool analyzes the user's input to determine if it is a medical query (requiring clinical attention) or an administrative query (scheduling, billing, etc.).
-2.  **Research Retrieval**:
-    If the query is classified as medical, the system searches through indexed medical databases (like PubMed and Miriad) to retrieve relevant research articles and Q/A pairs. This empowers the patient with trustworthy information and provides the doctor with context.
-### Training Script
-```bash
-python3 -m classifier.train
-```
-## Running the System Locally
-### Prerequisites
-*   Git
-*   Python 3
-### Setup & Configuration
-1.  **Clone the repository**
-    ```bash
-    git clone https://github.com/davidgraymi/health-query-classifier.git
-    cd health-query-classifier
-    ```
-2.  **Configure environment variables**
-    This project uses an `env.list` file for configuration. Create this file in the root directory.
-    ```ini
-    # env.list
-    HF_TOKEN="your-huggingface-token"
-    ```
-    *   **HF_TOKEN**: Access token can be generated via [huggingface](https://huggingface.co/settings/tokens). The token must have read permissions.
-3.  **Create a python virtual environment**
-    ```bash
-    python3 -m venv .venv
-    source .venv/bin/activate
-    ```
-4.  **Install dependencies**
-    ```bash
-    pip install -r requirements.txt
-    ```
-### Data Setup
-```bash
-python3 adapters/build_corpora.py
-```
-### Execution
-```bash
-python3 main.py
-```

 ---
+title: Medical Document Retrieval
 emoji:
+colorFrom: blue
+colorTo: green
 sdk: gradio
 sdk_version: 6.0.2
 app_file: app_retrieval_cached.py
 pinned: false
 ---
+# Medical Document Retrieval System
+This system uses BM25 + Dense Embeddings + RRF Fusion to search across 10,000+ medical documents.
+**Models:**
+- BM25 Index (keyword-based)
+- Dense Embeddings (embeddinggemma-300m-medical)
+- RRF Fusion (combines both approaches)
+**Note:** First startup takes 5-8 minutes to build indexes. Please be patient!