Enhance device selection and logging for inference; add end-to-end tests

1386847 about 2 months ago

9.18 kB

	---
	license: agpl-3.0
	pipeline_tag: image-segmentation
	tags:
	- medical
	- biology
	---

	# VascX Fork

	This repository is `zyf0717/vascx-fork`, a personal Hugging Face fork of the original [`Eyened/vascx`](https://huggingface.co/Eyened/vascx) release.

	It was cloned from the upstream VascX repository on April 20, 2026, and the work in this fork was continued from that cloned baseline.

	It now serves as a self-contained fork for running the VascX retinal fundus analysis pipeline, with:

	- the VascX model weights tracked in Git LFS
	- the Python package used to run preprocessing and inference from this repo
	- fork-specific packaging and runtime fixes
	- a root `config.yaml` for controlling overlay layers, colors, disc circles, and vessel-width sampling

	This is not the canonical upstream repository. The upstream project remains `Eyened/vascx`.

	## What Stays Compatible

	- The Python package name is still `vascx_models`
	- The model layout and output structure are kept compatible with the upstream VascX workflow

	## What Changed In This Fork

	- Repository identity is now `vascx-fork`
	- The default conda environment name in `environment.yml` and `run.sh` is `vascx-fork`
	- The legacy `setup.py` and installed `vascx` console script were removed
	- Supported entrypoints are `./run.sh` and `python -m vascx_models`
	- Overlay generation can now be configured from the root `config.yaml`
	- Inference device selection is automatic by default and can be overridden explicitly
	- Local helper scripts and docs were updated to point at this fork instead of the upstream Hub repo
	- Generated outputs, caches, and other non-repository artifacts are excluded from version control

	## Installation

	1. Install Git LFS and enable it for your machine:

	```bash
	git lfs install
	```

	2. Create an environment. The included environment file uses the fork name:

	```bash
	conda env create -f environment.yml
	conda activate vascx-fork
	```

	If you update `environment.yml` later, refresh the env with:

	```bash
	conda env update -f environment.yml --prune
	```

	If you are managing your own environment instead of using `environment.yml`, install `torch`, `torchvision`, `retinalysis-fundusprep`, and `retinalysis-inference` before running the package.

	## Quick Start

	Run the full pipeline:

	```bash
	./run.sh
	```

	`run.sh` activates the `vascx-fork` conda environment, defaults to the bundled sample images, and writes to a timestamped `output_YYYYMMDD_HHMMSS/` directory. You can override the main runtime inputs with environment variables:

	```bash
	INPUT_PATH=/path/to/images OUTPUT_PATH=/path/to/output N_JOBS=4 ./run.sh
	DEVICE=cpu INPUT_PATH=/path/to/images OUTPUT_PATH=/path/to/output ./run.sh
	./run.sh --sample-run
	```

	The standard Python entrypoint is:

	```bash
	python -m vascx_models run DATA_PATH OUTPUT_PATH
	```

	Both entrypoints auto-configure the local cache and model-release directories from the repository checkout.

	`DATA_PATH` can be:

	- a directory of fundus images
	- a CSV file with a `path` column

	Typical examples:

	```bash
	./run.sh --sample-run
	python -m vascx_models run /path/to/images /path/to/output
	python -m vascx_models run /path/to/image_list.csv /path/to/output
	python -m vascx_models run /path/to/preprocessed/images /path/to/output --no-preprocess
	python -m vascx_models run /path/to/images /path/to/output --device auto
	python -m vascx_models run /path/to/images /path/to/output --device cpu
	python -m vascx_models run /path/to/images /path/to/output --no-disc --no-quality --no-fovea --no-overlay
	python -m vascx_models run /path/to/images /path/to/output --no-vessels
	```

	## Device Selection

	Inference device selection is automatic by default.

	- `--device auto` is the default for `python -m vascx_models run`
	- `DEVICE=auto` is the default for `./run.sh`
	- Auto-selection priority is `cuda` first, then Apple Metal `mps`, then `cpu`
	- The CLI logs detected availability as `cuda=...`, `mps=...`, `cpu=True`
	- The CLI also logs the selected device for each run
	- You can force a backend with `--device cuda`, `--device mps`, or `--device cpu`
	- `./run.sh` forwards the `DEVICE` environment variable to the Python CLI
	- If you request `cuda` or `mps` explicitly and that backend is unavailable, the run exits with a clear error instead of silently falling back

	## Configuration

	This fork adds a root-level `config.yaml` for overlay behavior, disc-circle generation, and vessel-width sampling.

	If `config.yaml` exists in the current working directory, it is loaded first. Otherwise the repository-root `config.yaml` is used when present. You can also pass a specific file:

	```bash
	python -m vascx_models run DATA_PATH OUTPUT_PATH --config /path/to/config.yaml
	```

	The repository ships with this `config.yaml`:

	```yaml
	overlay:
	enabled: true
	layers:
	arteries: true
	veins: true
	disc: true
	fovea: true
	vessel_widths: true
	colours:
	artery: "#FF0000"
	vein: "#0000FF"
	vessel: "#00FF00"
	disc: "#FFFFFF"
	fovea: "#FFFF00"
	vessel_widths: "#00FF00"
	circles:
	- name: "2r"
	diameter: 2.0
	color: "#00FF00"
	- name: "3r"
	diameter: 3.0
	color: "#00FF00"
	vessel_widths:
	inner_circle: "2r"
	outer_circle: "3r"
	samples_per_connection: 5
	```

	Notes:

	- `overlay.enabled` sets the default overlay behavior when `--overlay/--no-overlay` is not passed
	- `overlay.layers` controls which predictions are drawn
	- `overlay.colors` and `overlay.colours` are both accepted
	- `overlay.circles` controls how many disc circles are generated and their diameters
	- Each circle entry requires `name` and `diameter`
	- Circle `color` and `colour` are both accepted and default to black when omitted
	- `overlay.layers.vessel_widths` controls whether sampled width segments are drawn in overlays
	- `overlay.colors.vessel_widths` controls the measurement-line color in overlays
	- `vessel_widths.inner_circle` and `vessel_widths.outer_circle` choose the circle pair used for sampling; when omitted, the two smallest valid circles are used
	- `vessel_widths.samples_per_connection` sets how many evenly spaced interior points are measured along each simple vessel path between the two circles
	- Colors can be written as `#RRGGBB` strings or RGB arrays such as `[255, 0, 0]`
	- If no config file is found, the built-in defaults still use `2r` and `3r` circles with vessel-width sampling enabled

	## Outputs

	With the default pipeline settings, `OUTPUT_PATH` contains:

	```text
	OUTPUT_PATH/
	├── preprocessed_rgb/
	├── vessels/
	├── artery_vein/
	├── disc/
	├── disc_circles/
	├── overlays/
	├── bounds.csv
	├── disc_geometry.csv
	├── vessel_widths.csv
	├── quality.csv
	└── fovea.csv
	```

	`disc_circles/` contains one subdirectory per configured circle name.

	`vessel_widths.csv` is written when both vessel/AV and disc outputs are available. Each row represents one sampled measurement along a retained artery or vein connection between the configured inner and outer circles, with these columns:

	- `image_id`
	- `inner_circle`, `outer_circle`
	- `inner_circle_radius_px`, `outer_circle_radius_px`
	- `connection_index`, `sample_index`
	- `x`, `y`
	- `width_px`
	- `x_start`, `y_start`, `x_end`, `y_end`
	- `vessel_type`

	Current measurement behavior is intentionally conservative:

	- only simple open skeleton paths that connect one inner-circle boundary point to one outer-circle boundary point are measured
	- branched, looping, ambiguous, or zero-length annulus components are skipped
	- if a sampled point on a retained connection fails width estimation, that entire connection is dropped from the CSV and overlay

	## Repository Contents

	- `vascx_models/`: package source and CLI
	- `artery_vein/`, `disc/`, `fovea/`, `vessels/`, `quality/`, `odfd/`, `discedge/`: model artifacts
	- `config.yaml`: fork-specific overlay configuration
	- `pytest.ini`: pytest marker definitions for slow and end-to-end tests
	- `run.sh`: primary local runner
	- `tests/`: pytest suite
	- `notebooks/`: preprocessing and inference examples

	## Testing

	The test suite includes unit tests, CLI tests, and an opt-in real-model single-image end-to-end smoke test in `tests/test_e2e.py`.

	Useful commands:

	```bash
	conda run -n vascx-fork pytest
	KMP_DUPLICATE_LIB_OK=TRUE conda run -n vascx-fork pytest tests/test_e2e.py -q
	KMP_DUPLICATE_LIB_OK=TRUE VASCX_RUN_E2E=1 conda run -n vascx-fork pytest tests/test_e2e.py -q -k cpu
	```

	Explicitly tested in this fork as of April 21, 2026:

	- README and CLI/config behavior updates are covered by the regular pytest suite
	- device resolution priority and explicit unavailable-device failures are covered by unit tests
	- the real single-image end-to-end pipeline was run successfully on CPU with preprocessing enabled
	- the end-to-end test is parameterized for `cpu`, `cuda`, and `mps`, but actual `cuda` and `mps` execution were not exercised in this workspace because those backends were unavailable

	## Upstream Reference

	Original upstream project:

	- Hugging Face: <https://huggingface.co/Eyened/vascx>
	- Paper: <https://arxiv.org/abs/2409.16016>

	This forked repository:

	- Hugging Face: <https://huggingface.co/zyf0717/vascx-fork>