Update README.md

5bdd639 verified 27 days ago

10.4 kB

	---
	license: bsd-3-clause
	pipeline_tag: image-segmentation
	tags:
	- AFM
	- physics
	- biology
	- atomic-force-microscopy
	- microscopy
	- image-processing
	- unet
	- surface-analysis
	- chemistry
	- nanoscience
	---
	# afMLevel-mask-unet

	This U‑Net model masks features in Atomic Force Microscopy (AFM) height maps.
	It outputs a probability mask image, the same size as the raw AFM image; the accompanying python package, [afMLevel](https://github.com/mayatek1/afMLevel)
	code then applies a threshold (typically 0.5) to produce a binary mask. This mask can then be used in automated
	levelling routines as implemented in the `level_mask_ml()` function in afMLevel, that output levelled images.

	> Note: afMLevel includes a second model, afMLevel-background-unet,
	> that predicts the noise background , without requiring a masking step. Both
	> models will be described in the accompanying paper and the background model is found
	> here: [https://huggingface.co/Heath-AFM-Lab/afMLevel-background-unet](https://huggingface.co/Heath-AFM-Lab/afMLevel-background-unet)

	## Model Details

	This model is part of the [afMLevel](https://github.com/mayatek1/afMLevel) project.

	### Model Description

	The afMLevel mask model is a 7‑layer U‑Net architecture implemented in PyTorch, designed to segment
	features in AFM height map images so that they can be excluded during traditional levelling routines (plane fits and median line
	subtraction). The network was trained on 256 × 256‑pixel inputs, and therefore expects images of this size at inference time.

	In order to apply the model and use it to level AFM images the afMLevel repository includes tools for:

	- image preprocessing and resizing,
	- running inference,
	- generating binary masks,
	- applying the masks in levelling routines,
	- integrating the model into batch-processing pipelines for HS-AFM videos and image sets.

	### Model Card Information

	- Developed by: Maya Tekchandani
	- Maintained by: Dr Daniel E. Rollins, Dr George R. Heath
	- Principal Investigator: Dr George R. Heath
	- Affiliation: Department of Physics and Astronomy, University of Leeds, UK
	- Funded by:
	- Maya Tekchandani is supported by a studentship funded by the Engineering and Physical Sciences Research Council and the
	Biotechnology and Biological Sciences Research Council.
	- Dr Daniel E. Rollins and Dr George R. Heath funded by Engineering and Physical Science Research Council grant EP/W034735/1.
	- Shared by: [Heath-AFM-Lab](https://heath-afm-lab.github.io/)
	- Model type: U‑Net segmentation model for AFM image feature masking
	- License: BSD‑3‑Clause
	- Finetuned from model: None (trained from scratch)

	### Model Sources

	<!-- Provide the basic links for the model. -->

	- Repository: https://github.com/mayatek1/afMLevel
	- Paper: Tekchandani et al., *AFMLevel: Deep learning U-Net Models for
	levelling Atomic Force Microscopy Images and Movies* (in preparation, 2026)
	- Demo: [Demonstration notebooks](https://github.com/mayatek1/afMLevel/tree/main/notebooks)

	## Uses

	This model is designed for use within the [afMLevel](https://github.com/mayatek1/afMLevel/) `mask_model` module.

	### Direct Use

	The [afMLevel](https://github.com/mayatek1/afMLevel/) inference code operates on NumPy arrays, so raw AFM files must first be
	loaded using an external reader such as [playnano](https://github.com/derollins/playNano), [AFMReader](https://github.com/AFM-SPM/AFMReader), or a custom
	loader. Once loaded, the afMLevel package and notebooks handle inference and output of either
	the predicted mask or the levelled image directly.

	The model has been primarily tested on biological AFM data (membranes, proteins, DNA origami,
	lattices, fibres) and is best suited to that context, though it may generalise to other sample types
	with similar imaging characteristics.

	### Downstream Use

	- Integration into [playnano](https://github.com/derollins/playNano), enabling
	end-to-end reading and levelling of high-speed AFM (HS-AFM) movies;
	the `afMLevel` package works as a plugin for `playnano` for easy integration (use processing step `level_ml_mask`).
	- Preprocessing for segmentation, particle detection, or other AFM analysis tools.

	### Out‑of‑Scope Use

	This model is not intended for:

	- predicting physical or mechanical properties of samples,
	- interpreting AFM contact mechanics,
	- working on specialized AFM modes (KPFM, MFM, FMM, etc.) without validation,
	- non-biological samples, without first validating performance on representative images.

	## Bias, Risks, and Limitations

	- The model was trained on a specific dataset of real AFM height maps; performance may degrade for very different imaging modes, scan sizes, or materials.
	- Extremely noisy scans or those containing jump‑to‑contact instabilities may produce inaccurate mask predictions.
	- The mask model cannot simultaneously threshold both high features (blobs) and
	low features (holes); images containing both may not be levelled optimally. In such cases the
	[background model](https://huggingface.co/Heath-AFM-Lab/afMLevel-background-unet) is recommended instead.
	- Users should visually inspect masks and levelled outputs before scientific interpretation.

	### Recommendations

	- Manually verify a representative subset of levelled images.
	- For images with complex topographies (e.g. three or more distinct height
	planes, or both holes and blobs), consider using the background model
	([`MLBackground`](https://huggingface.co/Heath-AFM-Lab/afMLevel-background-unet)) as an alternative or complement.
	- Avoid applying the model to imaging modes it was not trained on without validation.

	## How to Get Started with the Model

	Use the model through the [afMLevel](https://github.com/mayatek1/afMLevel) repository, which handles inference,
	binary mask generation and levelling. Demonstration notebooks are available
	[within the GitHub repository](https://github.com/mayatek1/afMLevel/tree/main/notebooks).

	## Training Details

	The model was trained from scratch on real AFM topography data using the PyTorch framework.

	### Training Data

	This model was trained on a dataset of 2,001 real AFM height‑map images from several AFM labs, spanning
	a range of biological sample types (membranes, proteins, DNA origami, lattices, fibres), AFM instruments (JPK,
	Bruker/RIBM, Asylum), and image features (blobs, holes, fibres, strong line noise, multiple planes).

	To increase dataset size and improve generalization, images were augmented using:

	- reflection along the y-axis,
	- rotation by 180°,
	- synthetic line-noise artefacts.

	This produced 10,005 training images for the mask model.
	An 80:20 train‑validation split was used.

	Ground-truth labels were derived from manually levelled images by applying Otsu thresholding
	to the manually levelled image.

	### Training Procedure

	- Architecture: 7‑layer U‑Net with large convolutional filters (7×7)
	- Framework: PyTorch
	- Optimizer: Adam
	- Learning rate: 0.0005
	- Objective: classification per pixel
	- Activation function: ReLU
	- Batch normalisation: applied after each convolutional layer
	- Dropout: none (p = 0.0)
	- Batch size: 32
	- Hardware: trained using GPU acceleration
	- Training images: 10,005 (mask model)
	- Train/validation split: 80:20 (random)
	- Loss functions used: Binary Cross-Entropy (BCE) and Dice loss (combined as a single loss: BCE + Dice)
	- Loss‑curve diagnostics were used to monitor convergence.

	#### Preprocessing

	Input images were preprocessed identically to the training data:

	1. An initial 1st-order plane fit applied in x and y.
	2. Min-max normalisation to [0, 1].
	3. Resizing to 256 × 256.

	For the mask model, nearest-neighbour interpolation was used for resizing.

	#### Training Hyperparameters

	- Training regime: fp32

	#### Speeds, Sizes, Times

	- Model file size: 598 MiB
	- Training epochs: 69
	<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->

	## Evaluation

	The performance of the ML‑generated masks was evaluated indirectly through their impact on
	automated levelling. The core quantitative metric used for assessment was the Mean Squared Error (MSE)
	between the output of the afMLevel auto-levelling routines, using ML model generated masks, and a manually
	levelled ground‑truth image. The results were also assessed visually by the developers.

	Full details are in the paper: In preparation (check the [GitHub](https://github.com/mayatek1/afMLevel) for updates).

	### Testing Data & Metrics

	#### Testing Data

	Evaluation was performed on a held‑out set of real AFM height‑map images spanning a wide range of:

	- biological samples,
	- imaging conditions,
	- noise levels,
	- numbers of surface planes,
	- scan artefacts (e.g., line noise, streaks).

	#### Metrics

	<!-- These are the evaluation metrics being used, ideally with a description of why. -->

	- Primary metric: MSE between auto‑levelled and manually levelled images.
	- Distribution analysis: mean vs. median MSE; a large difference between
	the two indicates that failed levelling produces pronounced artefacts.
	- Success‑rate: proportion of images below an MSE threshold of 0.1, selected as
	a conservative boundary for “well‑levelled” outputs.
	- Visual inspection score: percentage of images judged well-levelled by
	developer inspection, used as a complementary subjective metric.

	### Results

	Initial internal testing indicates that the ML‑generated masks enable reliable automated
	levelling across a broad range of AFM images, including scans with varied noise levels and
	multiple height planes. Quantitative results and statistical analyses will be provided in
	the accompanying paper (in preparation, check the [GitHub](https://github.com/mayatek1/afMLevel) for updates).

	## Citation

	Tekchandani et al., AFMLevel: Deep learning U-Net Models for levelling Atomic Force Microscopy Images and Movies (in preparation, 2026)

	## Model Card Authors

	- Maya Tekchandani
	- Dr Daniel E. Rollins
	- Dr George R. Heath

	## Contact

	For questions or issues, please open a GitHub issue at
	https://github.com/mayatek1/afMLevel or contact:

	George R. Heath- University of Leeds

	Email: G.R.Heath@leeds.ac.uk