Update README.md
Browse files
README.md
CHANGED
|
@@ -11,6 +11,9 @@ tags:
|
|
| 11 |
- Agent
|
| 12 |
- arxiv:2508.02258
|
| 13 |
---
|
|
|
|
|
|
|
|
|
|
| 14 |
## Introduction馃摑
|
| 15 |
**Vision Language Models** have demonstrated significant potential in medical imaging tasks, but pathology presents unique challenges due to its ultra-high resolution, complex tissue structures, and nuanced clinical semantics. These challenges often lead to **hallucinations** in VLMs, where the outputs are inconsistent with the visual evidence, undermining clinical trust. Current **Retrieval-Augmented Generation (RAG)** approaches predominantly rely on text-based knowledge bases, limiting their ability to effectively incorporate critical visual information from pathology images.
|
| 16 |
|
|
@@ -18,19 +21,19 @@ To address these challenges, we introduce **Patho-AgenticRAG**, a **multimodal R
|
|
| 18 |
|
| 19 |
Our experiments demonstrate that Patho-AgenticRAG significantly outperforms existing multimodal models in many tasks such as multiple-choice diagnosis and visual question answering.
|
| 20 |

|
| 21 |
-
|
| 22 |
This document outlines the workflow for setting up and running the **Patho-AgenticRAG** framework. The process involves the ingestion of pathology pdf images, model downloads, and serving the models for inference via API servers. Below are the steps to follow:
|
| 23 |
-
|
| 24 |
To ingest pathology images into Milvus for searching:
|
| 25 |
```bash
|
| 26 |
python milvus_ingestion.py
|
| 27 |
```
|
| 28 |
-
|
| 29 |
Next, run the Milvus search engine API to handle the retrieval process:
|
| 30 |
```bash
|
| 31 |
python milvus_search_engine_api.py
|
| 32 |
```
|
| 33 |
-
|
| 34 |
Download the necessary models from Hugging Face. These models are critical for the workflow and should be stored locally.
|
| 35 |
- Agentic-Router:
|
| 36 |
```bash
|
|
@@ -44,7 +47,7 @@ hf download autumncc/Qwen2.5-VL-7B-VRAG --local-dir ./models/Qwen2.5-VL-7B-VRAG
|
|
| 44 |
```bash
|
| 45 |
hf download WenchuanZhang/Patho-R1-7B --local-dir ./models/Patho-R1-7B --token <your-token>
|
| 46 |
```
|
| 47 |
-
|
| 48 |
You can now serve the models for inference using the following commands:
|
| 49 |
- Agentic Router (on CUDA device 1):
|
| 50 |
```bash
|
|
@@ -58,7 +61,7 @@ CUDA_VISIBLE_DEVICES=2,3 vllm serve ./models/Qwen2.5-VL-7B-VRAG --port 8003 --ho
|
|
| 58 |
```bash
|
| 59 |
CUDA_VISIBLE_DEVICES=4,5 python3 -m vllm.entrypoints.openai.api_server --model ./models/Patho-R1-7B --tokenizer ./models/Patho-R1-7B --port 8004 --host 0.0.0.0 --served-model-name Patho-R1 --tensor-parallel-size 2
|
| 60 |
```
|
| 61 |
-
|
| 62 |
Finally, run the Patho-AgenticRAG script for a demo:
|
| 63 |
```bash
|
| 64 |
python patho_agenticrag.py
|
|
|
|
| 11 |
- Agent
|
| 12 |
- arxiv:2508.02258
|
| 13 |
---
|
| 14 |
+
# Patho-AgenticRAG: Towards Multimodal Agentic Retrieval-Augmented Generation for Pathology VLMs via Reinforcement Learning
|
| 15 |
+
\[[Arxiv](https://arxiv.org/abs/2508.02258)\] | \[[Github Repo](https://github.com/Wenchuan-Zhang/Patho-AgenticRAG)] | \[[Cite](#citation)\]
|
| 16 |
+
|
| 17 |
## Introduction馃摑
|
| 18 |
**Vision Language Models** have demonstrated significant potential in medical imaging tasks, but pathology presents unique challenges due to its ultra-high resolution, complex tissue structures, and nuanced clinical semantics. These challenges often lead to **hallucinations** in VLMs, where the outputs are inconsistent with the visual evidence, undermining clinical trust. Current **Retrieval-Augmented Generation (RAG)** approaches predominantly rely on text-based knowledge bases, limiting their ability to effectively incorporate critical visual information from pathology images.
|
| 19 |
|
|
|
|
| 21 |
|
| 22 |
Our experiments demonstrate that Patho-AgenticRAG significantly outperforms existing multimodal models in many tasks such as multiple-choice diagnosis and visual question answering.
|
| 23 |

|
| 24 |
+
## Quickstart馃弮
|
| 25 |
This document outlines the workflow for setting up and running the **Patho-AgenticRAG** framework. The process involves the ingestion of pathology pdf images, model downloads, and serving the models for inference via API servers. Below are the steps to follow:
|
| 26 |
+
### 1. Milvus Ingestion
|
| 27 |
To ingest pathology images into Milvus for searching:
|
| 28 |
```bash
|
| 29 |
python milvus_ingestion.py
|
| 30 |
```
|
| 31 |
+
### 2. Milvus Search Engine API
|
| 32 |
Next, run the Milvus search engine API to handle the retrieval process:
|
| 33 |
```bash
|
| 34 |
python milvus_search_engine_api.py
|
| 35 |
```
|
| 36 |
+
### 3. Model Download
|
| 37 |
Download the necessary models from Hugging Face. These models are critical for the workflow and should be stored locally.
|
| 38 |
- Agentic-Router:
|
| 39 |
```bash
|
|
|
|
| 47 |
```bash
|
| 48 |
hf download WenchuanZhang/Patho-R1-7B --local-dir ./models/Patho-R1-7B --token <your-token>
|
| 49 |
```
|
| 50 |
+
### 4. Serving the Models
|
| 51 |
You can now serve the models for inference using the following commands:
|
| 52 |
- Agentic Router (on CUDA device 1):
|
| 53 |
```bash
|
|
|
|
| 61 |
```bash
|
| 62 |
CUDA_VISIBLE_DEVICES=4,5 python3 -m vllm.entrypoints.openai.api_server --model ./models/Patho-R1-7B --tokenizer ./models/Patho-R1-7B --port 8004 --host 0.0.0.0 --served-model-name Patho-R1 --tensor-parallel-size 2
|
| 63 |
```
|
| 64 |
+
### 5. Running the Demo
|
| 65 |
Finally, run the Patho-AgenticRAG script for a demo:
|
| 66 |
```bash
|
| 67 |
python patho_agenticrag.py
|