# PA-LLaVA-plus This is the first-stage weights trained on the 400w pathological image-text dataset using the PA-LLaVA model structure. The link is https://huggingface.co/OpenFace-CQUPT/PA-LLaVA-plus. ## 400w Dataset The 400w pathology dataset is derived from the publicly available "Accessible Dataset (18M samples)" from MedTrinity-25M([UCSC-VLAA/MedTrinity-25M ยท Datasets at Hugging Face](https://huggingface.co/datasets/UCSC-VLAA/MedTrinity-25M)). This is a dataset spanning multiple medical fields. By analyzing the linguistic structure of the text in this dataset, we extracted a 400w dataset specific to the pathology domain. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/663f06e01cd68975883a353e/JEMAyDGi9uRUsWGo6UBTA.png) ## Citation ``` @INPROCEEDINGS{10821785, author={Dai, Dawei and Zhang, Yuanhui and Xu, Long and Yang, Qianlan and Shen, Xiaojing and Xia, Shuyin and Wang, Guoyin}, booktitle={2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)}, title={PA-LLaVA: A Large Language-Vision Assistant for Human Pathology Image Understanding}, year={2024}, volume={}, number={}, pages={3138-3143}, keywords={Connectors;Pathology;Visualization;Codes;Computational modeling;Biological system modeling;Data models;Cleaning;Bioinformatics;Biomedical imaging;Pathology Image Understanding;VQA;LLaVA}, doi={10.1109/BIBM62325.2024.10821785}} @article{dai2025pathologyvlm, title={Pathologyvlm: a large vision-language model for pathology image understanding}, author={Dai, Dawei and Zhang, Yuanhui and Yang, Qianlan and Xu, Long and Shen, Xiaojing and Xia, Shuyin and Wang, Guoyin}, journal={Artificial Intelligence Review}, volume={58}, number={6}, pages={1--19}, year={2025}, publisher={Springer} } ``` --- license: cc ---