WenchuanZhang commited on
Commit
7605441
verified
1 Parent(s): 078e50b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -6
README.md CHANGED
@@ -11,6 +11,9 @@ tags:
11
  - Agent
12
  - arxiv:2508.02258
13
  ---
 
 
 
14
  ## Introduction馃摑
15
  **Vision Language Models** have demonstrated significant potential in medical imaging tasks, but pathology presents unique challenges due to its ultra-high resolution, complex tissue structures, and nuanced clinical semantics. These challenges often lead to **hallucinations** in VLMs, where the outputs are inconsistent with the visual evidence, undermining clinical trust. Current **Retrieval-Augmented Generation (RAG)** approaches predominantly rely on text-based knowledge bases, limiting their ability to effectively incorporate critical visual information from pathology images.
16
 
@@ -18,19 +21,19 @@ To address these challenges, we introduce **Patho-AgenticRAG**, a **multimodal R
18
 
19
  Our experiments demonstrate that Patho-AgenticRAG significantly outperforms existing multimodal models in many tasks such as multiple-choice diagnosis and visual question answering.
20
  ![Patho-AgenticRAG Overview](https://github.com/Wenchuan-Zhang/Patho-AgenticRAG/raw/main/docs/casestudy.png)
21
- ### Quickstart馃弮
22
  This document outlines the workflow for setting up and running the **Patho-AgenticRAG** framework. The process involves the ingestion of pathology pdf images, model downloads, and serving the models for inference via API servers. Below are the steps to follow:
23
- ## 1. Milvus Ingestion
24
  To ingest pathology images into Milvus for searching:
25
  ```bash
26
  python milvus_ingestion.py
27
  ```
28
- ## 2. Milvus Search Engine API
29
  Next, run the Milvus search engine API to handle the retrieval process:
30
  ```bash
31
  python milvus_search_engine_api.py
32
  ```
33
- ## 3. Model Download
34
  Download the necessary models from Hugging Face. These models are critical for the workflow and should be stored locally.
35
  - Agentic-Router:
36
  ```bash
@@ -44,7 +47,7 @@ hf download autumncc/Qwen2.5-VL-7B-VRAG --local-dir ./models/Qwen2.5-VL-7B-VRAG
44
  ```bash
45
  hf download WenchuanZhang/Patho-R1-7B --local-dir ./models/Patho-R1-7B --token <your-token>
46
  ```
47
- ## 4. Serving the Models
48
  You can now serve the models for inference using the following commands:
49
  - Agentic Router (on CUDA device 1):
50
  ```bash
@@ -58,7 +61,7 @@ CUDA_VISIBLE_DEVICES=2,3 vllm serve ./models/Qwen2.5-VL-7B-VRAG --port 8003 --ho
58
  ```bash
59
  CUDA_VISIBLE_DEVICES=4,5 python3 -m vllm.entrypoints.openai.api_server --model ./models/Patho-R1-7B --tokenizer ./models/Patho-R1-7B --port 8004 --host 0.0.0.0 --served-model-name Patho-R1 --tensor-parallel-size 2
60
  ```
61
- ## 5. Running the Demo
62
  Finally, run the Patho-AgenticRAG script for a demo:
63
  ```bash
64
  python patho_agenticrag.py
 
11
  - Agent
12
  - arxiv:2508.02258
13
  ---
14
+ # Patho-AgenticRAG: Towards Multimodal Agentic Retrieval-Augmented Generation for Pathology VLMs via Reinforcement Learning
15
+ \[[Arxiv](https://arxiv.org/abs/2508.02258)\] | \[[Github Repo](https://github.com/Wenchuan-Zhang/Patho-AgenticRAG)] | \[[Cite](#citation)\]
16
+
17
  ## Introduction馃摑
18
  **Vision Language Models** have demonstrated significant potential in medical imaging tasks, but pathology presents unique challenges due to its ultra-high resolution, complex tissue structures, and nuanced clinical semantics. These challenges often lead to **hallucinations** in VLMs, where the outputs are inconsistent with the visual evidence, undermining clinical trust. Current **Retrieval-Augmented Generation (RAG)** approaches predominantly rely on text-based knowledge bases, limiting their ability to effectively incorporate critical visual information from pathology images.
19
 
 
21
 
22
  Our experiments demonstrate that Patho-AgenticRAG significantly outperforms existing multimodal models in many tasks such as multiple-choice diagnosis and visual question answering.
23
  ![Patho-AgenticRAG Overview](https://github.com/Wenchuan-Zhang/Patho-AgenticRAG/raw/main/docs/casestudy.png)
24
+ ## Quickstart馃弮
25
  This document outlines the workflow for setting up and running the **Patho-AgenticRAG** framework. The process involves the ingestion of pathology pdf images, model downloads, and serving the models for inference via API servers. Below are the steps to follow:
26
+ ### 1. Milvus Ingestion
27
  To ingest pathology images into Milvus for searching:
28
  ```bash
29
  python milvus_ingestion.py
30
  ```
31
+ ### 2. Milvus Search Engine API
32
  Next, run the Milvus search engine API to handle the retrieval process:
33
  ```bash
34
  python milvus_search_engine_api.py
35
  ```
36
+ ### 3. Model Download
37
  Download the necessary models from Hugging Face. These models are critical for the workflow and should be stored locally.
38
  - Agentic-Router:
39
  ```bash
 
47
  ```bash
48
  hf download WenchuanZhang/Patho-R1-7B --local-dir ./models/Patho-R1-7B --token <your-token>
49
  ```
50
+ ### 4. Serving the Models
51
  You can now serve the models for inference using the following commands:
52
  - Agentic Router (on CUDA device 1):
53
  ```bash
 
61
  ```bash
62
  CUDA_VISIBLE_DEVICES=4,5 python3 -m vllm.entrypoints.openai.api_server --model ./models/Patho-R1-7B --tokenizer ./models/Patho-R1-7B --port 8004 --host 0.0.0.0 --served-model-name Patho-R1 --tensor-parallel-size 2
63
  ```
64
+ ### 5. Running the Demo
65
  Finally, run the Patho-AgenticRAG script for a demo:
66
  ```bash
67
  python patho_agenticrag.py