tarakjc2c commited on
Commit
c28400a
·
1 Parent(s): d2aea0d

Fix README metadata

Browse files
Files changed (1) hide show
  1. README.md +14 -81
README.md CHANGED
@@ -1,88 +1,21 @@
1
  ---
 
2
  emoji:
 
 
3
  sdk: gradio
4
  sdk_version: 6.0.2
5
  app_file: app_retrieval_cached.py
6
  pinned: false
7
  ---
8
- title: Medical_Document_Retrieval
9
- app_file: app_retrieval_cached.py
10
- sdk: gradio
11
- sdk_version: 6.0.2
12
- ---
13
- # Health Query Classifier & Research Retriever
14
-
15
- ## Team Members
16
- * **David Gray**
17
- * **Tarak Jha**
18
- * **Sravani Segireddy**
19
- * **Riley Millikan**
20
- * **Kent R. Spillner**
21
-
22
- ## Project Description
23
- This project is a classifier that triages patient queries. If a query is identified as medical, the system retrieves relevant research and presents it to the user.
24
-
25
- ## Workflow
26
- The system operates in two main stages to optimize patient care and provider efficiency:
27
-
28
- 1. **Classification (Triage)**:
29
- The tool analyzes the user's input to determine if it is a medical query (requiring clinical attention) or an administrative query (scheduling, billing, etc.).
30
-
31
- 2. **Research Retrieval**:
32
- If the query is classified as medical, the system searches through indexed medical databases (like PubMed and Miriad) to retrieve relevant research articles and Q/A pairs. This empowers the patient with trustworthy information and provides the doctor with context.
33
-
34
- ### Training Script
35
-
36
- ```bash
37
- python3 -m classifier.train
38
- ```
39
-
40
- ## Running the System Locally
41
-
42
- ### Prerequisites
43
- * Git
44
- * Python 3
45
-
46
- ### Setup & Configuration
47
-
48
- 1. **Clone the repository**
49
-
50
- ```bash
51
- git clone https://github.com/davidgraymi/health-query-classifier.git
52
- cd health-query-classifier
53
- ```
54
-
55
- 2. **Configure environment variables**
56
-
57
- This project uses an `env.list` file for configuration. Create this file in the root directory.
58
- ```ini
59
- # env.list
60
- HF_TOKEN="your-huggingface-token"
61
- ```
62
- * **HF_TOKEN**: Access token can be generated via [huggingface](https://huggingface.co/settings/tokens). The token must have read permissions.
63
-
64
- 3. **Create a python virtual environment**
65
-
66
- ```bash
67
- python3 -m venv .venv
68
- source .venv/bin/activate
69
- ```
70
-
71
- 4. **Install dependencies**
72
-
73
- ```bash
74
- pip install -r requirements.txt
75
- ```
76
-
77
- ### Data Setup
78
-
79
- ```bash
80
- python3 adapters/build_corpora.py
81
- ```
82
-
83
- ### Execution
84
-
85
- ```bash
86
- python3 main.py
87
- ```
88
-
 
1
  ---
2
+ title: Medical Document Retrieval
3
  emoji:
4
+ colorFrom: blue
5
+ colorTo: green
6
  sdk: gradio
7
  sdk_version: 6.0.2
8
  app_file: app_retrieval_cached.py
9
  pinned: false
10
  ---
11
+
12
+ # Medical Document Retrieval System
13
+
14
+ This system uses BM25 + Dense Embeddings + RRF Fusion to search across 10,000+ medical documents.
15
+
16
+ **Models:**
17
+ - BM25 Index (keyword-based)
18
+ - Dense Embeddings (embeddinggemma-300m-medical)
19
+ - RRF Fusion (combines both approaches)
20
+
21
+ **Note:** First startup takes 5-8 minutes to build indexes. Please be patient!