Alhdrawi
/

x_alhdrawi

Model card Files Files and versions

xet

Community

Alhdrawi commited on May 16, 2025

Commit

5a69d0a

verified ·

1 Parent(s): ad90b77

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +157 -0

README.md ADDED Viewed

	@@ -0,0 +1,157 @@

+# Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning
+<details>
+<summary>
+  <b>Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning</b>, Nat. Biomed. Eng (2022).
+  <a href="https://www.nature.com/articles/s41551-022-00936-9" target="blank">[Paper]</a>
+	<br><em><a href="https://www.linkedin.com/in/ekin-tiu-0aa467200/">Ekin Tiu</a>, <a href="https://www.linkedin.com/in/ellie-talius/">Ellie Talius</a>, <a href="https://www.linkedin.com/in/pujanpatel24/">Pujan Patel</a>, <a href="https://med.stanford.edu/profiles/curtis-langlotz">Curtis P. Langlotz</a>, <a href="https://www.andrewng.org/">Andrew Y. Ng</a>, <a href="https://pranavrajpurkar.squarespace.com/">Pranav Rajpurkar</a></em></br>
+</summary>
+```bash
+Tiu, E., Talius, E., Patel, P. et al. Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning. Nat. Biomed. Eng (2022). https://doi.org/10.1038/s41551-022-00936-9
+```
+</details>
+<img width="848" alt="Screen Shot 2022-09-15 at 10 57 16 AM" src="https://user-images.githubusercontent.com/12751529/190451160-a919b363-6005-4cd4-9633-b194392bd728.png">
+This repository contains code to train a self-supervised learning model on chest X-ray images that lack explicit annotations and evalute this model's performance on pathology-classification tasks.
+<details>
+  <summary>
+	  <b>Main Findings</b>
+  </summary>
+1. **Automatically detecting pathologies in chest x-rays without explicit annotations:** Our method learns directly from the combination of images and unstructured radiology reports, thereby avoiding time-consuming labeling efforts. Our deep learning method is capable of predicting multiple pathologies and differential diagnoses that it had not explicitly seen during training.
+2. **Matching radiologist performance on different tasks on an external test set:** Our method performed on par with human performance when evaluated on an external validation set (CheXpert) of chest x-ray images labeled for the presence of 14 different conditions by multiple radiologists.
+3. **Outperforming approaches that train on explicitly labeled data on an external test set:**  Using no labels, we outperformed a fully supervised approach (100% of labels) on 3 out of the 8 selected pathologies on a dataset (PadChest) collected in a different country. We further demonstrated high performance (AUC > 0.9) on 14 findings and at least 0.700 on 53 findings out of 107 radiographic findings that the method had not seen during training.
+</details>
+## Dependencies
+To clone all files:
+```git clone https://github.com/rajpurkarlab/CheXzero.git```
+To install Python dependencies:
+```pip install -r requirements.txt```
+## Data
+### Training Dataset
+1. Download images come from [MIMIC-CXR JPG] https://physionet.org/content/mimic-cxr-jpg/2.0.0/ and reports from [MIMIC-CXR Database](https://physionet.org/content/mimic-cxr/2.0.0/) Note: in order to gain access to the data, you must be a credentialed user as defined on [PhysioNet](https://physionet.org/settings/credentialing/).
+2. Copy the dataset into the `data/` directory.
+3. Run `python run_preprocess.py`
+4. This should preprocess the chest x-ray images into a Hierarchical Data Format (HDF) format used for training stored at `data/cxr.h5` and extract the impressions section as text from the corresponding chest x-ray radiology report stored at `data/mimic_impressions.csv` .
+### Evaluation Dataset
+#### CheXpert Dataset
+The CheXpert dataset consists of chest radiographic examinations from Stanford Hospital, performed between October 2002
+and July 2017 in both inpatient and outpatient centers. Population-level characteristics are unavailable for the CheXpert test
+dataset, as they are used for official evaluation on the CheXpert leaderboard.
+The main data (CheXpert data) supporting the results of this study are available at https://aimi.stanford.edu/chexpert-chest-x-rays.
+The CheXpert **test** dataset has recently been made public, and can be found by following the steps in the [cheXpert-test-set-labels](https://github.com/rajpurkarlab/cheXpert-test-set-labels) repository.
+#### PadChest Dataset
+The PadChest dataset contains chest X-rays that were interpreted by 18 radiologists at the Hospital Universitario de San Juan,
+Alicante, Spain, from January 2009 to December 2017. The dataset contains 109,931 image studies and 168,861 images.
+PadChest also contains 206,222 study reports.
+The [PadChest](https://arxiv.org/abs/1901.07441) is publicly available at https://bimcv.cipf.es/bimcv-projects/padchest. Those who would like to use PadChest for experimentation should request access to PadChest at the [link](https://bimcv.cipf.es/bimcv-projects/padchest).
+### Model Checkpoints
+Model checkpoints of CheXzero pre-trained on MIMIC-CXR are publicly available at the following [link](https://drive.google.com/drive/folders/1makFLiEMbSleYltaRxw81aBhEDMpVwno?usp=sharing). Download files and save them in the `./checkpoints/chexzero_weights` directory.
+## Running Training
+Run the following command to perform CheXzero pretraining.
+```bash
+python run_train.py --cxr_filepath "./data/cxr.h5" --txt_filepath "data/mimic_impressions.csv"
+```
+### Arguments
+* `--cxr_filepath` Directory to load chest x-ray image data from.
+* `--txt_filepath` Directory to load radiology report impressions text from.
+Use `-h` flag to see all optional arguments.
+## Zero-Shot Inference
+See the following [notebook](https://github.com/rajpurkarlab/CheXzero/blob/main/notebooks/zero_shot.ipynb) for an example of how to use CheXzero to perform zero-shot inference on a chest x-ray dataset. The example shows how to output predictions from the model ensemble and evaluate performance of the model if ground truth labels are available.
+```python
+import zero_shot
+# computes predictions for a set of images stored as a np array of probabilities for each pathology
+predictions, y_pred_avg = zero_shot.ensemble_models(
+    model_paths=model_paths,
+    cxr_filepath=cxr_filepath,
+    cxr_labels=cxr_labels,
+    cxr_pair_template=cxr_pair_template,
+    cache_dir=cache_dir,
+)
+```
+### Arguments
+* `model_paths: List[str]`: List of paths to all checkpoints to be used in the ensemble. To run on a single model, input a list containing a single path.
+* `cxr_filepath: str`: Path to images `.h5` file
+* `cxr_labels: List[str]`: List of pathologies to query in each image
+* `cxr_pair_templates: Tuple[str, str]`: constrasting templates used to query model (see Figure 1 in article for visual explanation).
+* `cache_dir: str`: Directory to cache predictions of each checkpoint, use to avoid recomputing predictions.
+In order to use CheXzero for zero-shot inference, ensure the following requirements are met:
+* All input *`images`* must be stored in a single `.h5` (Hierarchical Data Format). See the [`img_to_h5`](https://github.com/rajpurkarlab/CheXzero/blob/main/preprocess_padchest.py#L156) function in [preprocess_padchest.py](https://github.com/rajpurkarlab/internal-chexzero/blob/cleanversion/preprocess_padchest.py) for an example of how to convert a list of paths to `.png` files into a valid `.h5` file.
+* The *ground truth `labels`* must be in a `.csv` dataframe where rows represent each image sample, and each column represents the binary labels for a particular pathology on each sample.
+* Ensure all [model checkpoints](https://drive.google.com/drive/folders/1makFLiEMbSleYltaRxw81aBhEDMpVwno?usp=sharing) are stored in `checkpoints/chexzero_weights/`, or the `model_dir` that is specified in the notebook.
+## Evaluation
+Given a numpy array of predictions (obtained from zero-shot inference), and a numpy array of ground truth labels, one can evaluate the performance of the model using the following code:
+```python
+import zero_shot
+import eval
+# loads in ground truth labels into memory
+test_pred = y_pred_avg
+test_true = zero_shot.make_true_labels(cxr_true_labels_path=cxr_true_labels_path, cxr_labels=cxr_labels)
+# evaluate model, no bootstrap
+cxr_results: pd.DataFrame = eval.evaluate(test_pred, test_true, cxr_labels) # eval on full test datset
+# boostrap evaluations for 95% confidence intervals
+bootstrap_results: Tuple[pd.DataFrame, pd.DataFrame] = eval.bootstrap(test_pred, test_true, cxr_labels) # (df of results for each bootstrap, df of CI)
+# print results with confidence intervals
+print(bootstrap_results[1])
+```
+The results are represented as a `pd.DataFrame` which can be saved as a `.csv`.
+### CheXpert Test Dataset
+In order to replicate the results in the paper, zero-shot inference and evaluation can be performed on the now publicly available CheXpert test dataset.
+1) Download labels at [cheXpert-test-set-labels](https://github.com/rajpurkarlab/cheXpert-test-set-labels/blob/main/groundtruth.csv) and image files from [Stanford AIMI](https://stanfordaimi.azurewebsites.net/datasets/23c56a0d-15de-405b-87c8-99c30138950c) and save in the `./data` directory in `CheXzero/`. The test dataset images should have the following directory structure:
+```
+data/
+├─ CheXpert/
+│  ├─ test/
+│  │  ├─ patient64741/
+│  │  │  ├─ study1/
+│  │  │  │  ├─ view1_frontal.jpg
+│  │  ├─ .../
+```
+2) Run `run_preprocess.py` script with the following arguments:
+```bash
+python run_preprocess.py --dataset_type "chexpert-test" --cxr_out_path "./data/chexpert_test.h5" --chest_x_ray_path "./data/CheXpert/test/"
+```
+This should save a `.h5` version of the test dataset images which can be used for evaluation.
+3) Open sample zero-shot [notebook](https://github.com/rajpurkarlab/CheXzero/blob/main/notebooks/zero_shot.ipynb) and run all cells. If the directory structure is set up correctly, then all cells should run without errors.
+## Issues
+Please open new issue threads specifying the issue with the codebase or report issues directly to ekintiu@stanford.edu.
+## Citation
+```bash
+Tiu, E., Talius, E., Patel, P. et al. Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning. Nat. Biomed. Eng (2022). https://doi.org/10.1038/s41551-022-00936-9
+```
+## License
+The source code for the site is licensed under the MIT license, which you can find in the `LICENSE` file. Also see `NOTICE.md` for attributions to third-party sources.