afMLevel-background-unet
This U‑Net model predicts tilt, z scanner drift, and other large‑scale imaging artifacts present in Atomic Force Microscopy (AFM) height maps. It outputs a background image, the same size and scale as the raw AFM image, which can be subtracted (via the accompanying afMLevel code) to produce a levelled height map.
Note: afMLevel includes a second model, afMLevel-mask-unet, that predicts a feature map that can be used in traditional auto-levelling routines where line and plane fits are applied to the background (unmasked, featureless regions) rather than directly generating the noise background. Both models will be described in the accompanying paper and the mask model is found here: https://huggingface.co/Heath-AFM-Lab/afMLevel-mask-unet
Model Details
This model is part of the afMLevel project.
Model Description
This model is a 7‑layer U‑Net architecture implemented in PyTorch, trained to perform image‑to‑image regression for background prediction in AFM height maps. The network was trained on 256 × 256‑pixel images and therefore expects inputs of this size at inference time.
The afMLevel repository includes tools for:
- image preprocessing and tiling,
- running inference,
- generating noise background,
- noise background subtraction,
- integrating the model into batch-processing pipelines for HS-AFM videos and image sets.
Model Card Information
- Developed by: Maya Tekchandani
- Maintained by: Dr Daniel E. Rollins, Dr George R. Heath
- Principal Investigator: Dr George R. Heath
- Affiliation: Department of Physics and Astronomy, University of Leeds, UK
- Funded by:
- Maya Tekchandani is supported by a studentship funded by the Engineering and Physical Sciences Research Council and the Biotechnology and Biological Sciences Research Council.
- Dr Daniel E. Rollins and Dr George R. Heath funded by Engineering and Physical Science Research Council grant EP/W034735/1.
- Shared by: Heath-AFM-Lab
- Model type: U‑Net regression model for AFM background estimation
- License: BSD‑3‑Clause
- Finetuned from model: None (trained from scratch)
Model Sources
- Repository: https://github.com/mayatek1/afMLevel
- Paper: Tekchandani et al., AFMLevel: Deep learning U-Net Models for levelling Atomic Force Microscopy Images and Movies (in preparation, 2026)
- Demo: Demonstration notebooks
Uses
This model is designed for use within the afMLevel background_model module.
Direct Use
The afMLevel inference code operates on NumPy arrays, so raw AFM files must first be loaded using an external reader such as playnano, AFMReader, or a custom loader. Once loaded, the afMLevel package and notebooks handle inference and output of either the predicted background or the levelled image (with the predicted background subtracted) directly.
The model has been primarily tested on biological AFM data (membranes, proteins, DNA origami, lattices, fibres) and is best suited to that context, though it may generalise to other sample types with similar imaging characteristics.
Downstream Use
- Integration into playnano, enabling end-to-end reading and levelling of high-speed AFM (HS-AFM) movies; the
afMLevelpackage works as a plugin forplaynanofor easy integration (use processing steplevel_ml_bg).
Out‑of‑Scope Use
This model is not intended for:
- prediction of physical or mechanical properties,
- denoising heavily corrupted AFM scans outside the training distribution,
- interpretation of AFM contact mechanics,
- specialised AFM modes (KPFM, MFM, FMM, etc.) without validation,
- non‑biological samples without performance verification.
Bias, Risks, and Limitations
- The model was trained on a specific dataset of real AFM height maps; performance may degrade for very different imaging modes, scan sizes, or materials.
- Extremely noisy scans or those containing jump‑to‑contact instabilities may produce inaccurate background predictions.
- The model may occasionally identify horizontal sample features as part of the background, causing them to be subtracted from the levelled image; users should inspect predicted backgrounds carefully.
- Users should visually inspect a subset of the predicted backgrounds to ensure sample features are not present, which can affect local pixel height values in the levelled image.
- The levelled outputs should also be inspected visually before scientific interpretation.
Recommendations
- Manually verify a subset of predicted backgrounds and levelled images.
- Avoid applying the model to imaging modes it was not trained on without validation.
How to Get Started with the Model
Use the model through the afMLevel repository, which handles inference, noise background generation and levelling. Demonstration notebooks are available within the GitHub repository.
Inference Speed
Per-frame processing time scales with image resolution in discrete steps rather than continuously: processing time is dominated by the inference step, and the number of inference steps increases as the image resolution crosses 256-pixel thresholds and additional tiles are generated. For a range of resolutions that map to the same number of tiles, per-frame processing time remains approximately constant, producing plateaus. The background model is slower than classical non-ML levelling routines but is not prohibitive for research pipelines at typical AFM and HS-AFM resolutions. Full timing benchmarks are provided in the accompanying paper (in preparation, check the GitHub for updates).
Training Details
The model was trained from scratch on real AFM topography data using the PyTorch framework.
Training Data
This model was trained on a dataset of 2,001 real AFM height‑map images from several AFM labs, spanning a range of biological sample types (membranes, proteins, DNA origami, lattices, fibres), AFM instruments (JPK, Bruker/RIBM, Asylum), and image features (blobs, holes, fibres, strong line noise, multiple planes).
To increase dataset size and improve generalization, images were augmented using:
- reflection along the y‑axis,
- rotation by 180°.
This produced 6,003 training images. An 80:20 train‑validation split was used.
Training Procedure
- Architecture: 7‑layer U‑Net with large convolutional filters (9×9)
- Framework: PyTorch
- Optimizer: Adam
- Learning rate: 0.0005
- Objective: pixel‑wise continuous regression
- Activation function: ReLU
- Batch normalisation: applied after each convolutional layer
- Dropout: none (p = 0.0)
- Batch size: 32
- Hardware: trained with GPU acceleration
- Training images: 6,003
- Train/validation split: 80:20 (random)
- Loss function: Mean Squared Error (MSE)
- Loss‑curve diagnostics were used to monitor convergence.
Preprocessing
Input images were preprocessed identically to the training data:
- An initial 1st-order plane fit applied in x and y.
- Min-max normalisation to [0, 1].
- Images larger than 256 × 256 are pixel-split into 256 × 256 tiles for inference.
For the background model, a pixel-split method was employed for tiling to preserve all pixel values. Multiple 256 × 256 images were generated by taking alternating pixels.
Training Hyperparameters
- Training regime: fp32
Speeds, Sizes, Times
- Model file size: 982 MiB
- Training epochs: 59
Evaluation
The performance of the ML‑generated predicted backgrounds was evaluated indirectly through their impact on levelling. The core quantitative metric used for assessment was the Mean Squared Error (MSE) between the afMLevel predicted-background-subtracted output and a manually levelled ground‑truth image. The results were also assessed visually by the developers.
Full details are in the paper: in preparation (check the GitHub for updates).
Testing Data & Metrics
Testing Data
Evaluation was performed on a held‑out set of real AFM height maps spanning a wide range of:
- biological sample types,
- imaging conditions,
- noise levels,
- numbers of surface planes,
- scan artefacts (e.g., streaks, line noise).
Metrics
- Primary metric: MSE between auto‑levelled and manually levelled images.
- Distribution analysis: mean vs. median MSE; a large difference between the two indicates that failed levelling produces pronounced artefacts.
- Success‑rate: proportion of images below an MSE threshold of 0.1, selected as a conservative boundary for "well‑levelled" outputs.
- Visual inspection score: percentage of images judged well-levelled by developer inspection, used as a complementary subjective metric.
Results
Initial internal testing indicates that the ML‑generated predicted backgrounds enable reliable automated levelling across a broad range of AFM images, including scans with varied noise levels and multiple height planes. Quantitative results and statistical analyses will be provided in the accompanying paper (in preparation, check the GitHub for updates).
Citation
Tekchandani et al., AFMLevel: Deep learning U-Net Models for levelling Atomic Force Microscopy Images and Movies (in preparation, 2026)
Model Card Authors
- Maya Tekchandani
- Dr Daniel E. Rollins
- Dr George R. Heath
Contact
For questions or issues, please open a GitHub issue at https://github.com/mayatek1/afMLevel or contact:
George R. Heath - University of Leeds
Email: G.R.Heath@leeds.ac.uk