OpenFace-CQUPT
/

Pathology-LLaVA

Model card Files Files and versions

OpenFace-CQUPT commited on Aug 20, 2024

Commit

cad1498

·

verified ·

1 Parent(s): d334b38

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -21,13 +21,13 @@ We developed a domain-speciffc large language-vision assistant (PA-LLaVA) for pa
 ### Introduction
 These public datasets contain substantial amounts of data unrelated to human pathology. To obtain the human pathology image-text data, we performed two cleaning processes on the raw data, as illustrated in the follow figture: (1) Removing nonpathological images. (2) Removing nonhuman pathology data. Additionally, we excluded image-text pairs with textual descriptions of fewer than 20 words. Ultimately, we obtained 518,413 image-text pairs (named "PCaption-0.5M" ) for the aligned training dataset.
-Instruction fine-tuning phase we only cleaned PMC-VQA in the same way and obtained 15,788 question-answer pairs related to human pathology. Lastly, we combined PathVQA and Human pathology data obtained from PMC-VQA, thereby constructing a dataset of 35543 question-answer pairs.data.
 #### Data Cleaning Process
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/663f06e01cd68975883a353e/IAeFWhH8brZYDaTJnew2N.png)
-### Get the Dataset
 ### Step 1 Download the public datasets.
 Here we only provide the download link for the public dataset and expose the image id index of our cleaned dataset on HuggingFace.

 ### Introduction
 These public datasets contain substantial amounts of data unrelated to human pathology. To obtain the human pathology image-text data, we performed two cleaning processes on the raw data, as illustrated in the follow figture: (1) Removing nonpathological images. (2) Removing nonhuman pathology data. Additionally, we excluded image-text pairs with textual descriptions of fewer than 20 words. Ultimately, we obtained 518,413 image-text pairs (named "PCaption-0.5M" ) for the aligned training dataset.
+Instruction fine-tuning phase we only cleaned PMC-VQA in the same way and obtained 15,788 question-answer pairs related to human pathology. Lastly, we combined PathVQA and Human pathology data obtained from PMC-VQA, thereby constructing a dataset of 35543 question-answer pairs data.
 #### Data Cleaning Process
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/663f06e01cd68975883a353e/IAeFWhH8brZYDaTJnew2N.png)
+## Get the Dataset
 ### Step 1 Download the public datasets.
 Here we only provide the download link for the public dataset and expose the image id index of our cleaned dataset on HuggingFace.