File size: 4,120 Bytes
91a5362 f921783 91a5362 f921783 91a5362 f921783 91a5362 18cc612 90df10a 18cc612 90df10a 18cc612 90df10a 18cc612 90df10a 18cc612 90df10a 18cc612 91a5362 d07aa47 91a5362 90df10a 18cc612 91a5362 18cc612 91a5362 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
---
license: apache-2.0
language: en
tags:
- image-classification
- vision-transformer
- pytorch
- stm
- materials-science
- nffa-di
base_model:
- google/vit-base-patch32-224-in21k
pipeline_tag: image-classification
---
# Vision Transformer for STM Multi-Tip Artifact Detection
This is a fine-tuned **Vision Transformer (ViT-B/32)** model for classifying Scanning Tunneling Microscopy (STM) images. It is designed to detect the presence of **multi-tip artifacts**, a common distortion that results in duplicated signals and complicates data interpretation.
This model was developed as part of the **NFFA-DI (Nano Foundries and Fine Analysis Digital Infrastructure)** project, funded by the European Union's NextGenerationEU program.
## Model Description
The model is a `ViT-B/32` pre-trained on ImageNet-21k. It was fine-tuned to classify an STM image as either `Artifact-Free` or `Multi-Tip Artifact`.
A key feature of this model is its use of a **Fast Fourier Transform (FFT)** based preprocessing method. The model's input is not a standard image but a 3-channel tensor composed of:
1. The grayscale STM image.
2. The **amplitude** of the image's Fourier transform.
3. The **phase** of the image's Fourier transform.
This approach significantly improves the model's ability to identify the subtle patterns characteristic of multi-tip artifacts.
## How to Use
The following Python code shows how to load and use the model for inference.
```python
import torch
import numpy as np
from PIL import Image
from transformers import AutoModelForImageClassification
def preprocess_for_artifact_detection(image_path):
"""
Loads an STM image and converts it to the required 3-channel format
(grayscale, magnitude spectrum, phase) for the model.
"""
try:
with Image.open(image_path) as img:
img = img.convert('L').resize((224, 224))
grayscale_img = np.array(img) / 255.0
except FileNotFoundError:
print(f"Error: The file at {image_path} was not found.")
return None
# Compute FFT, Magnitude, and Phase
fft_data = np.fft.fft2(grayscale_img)
fft_shifted = np.fft.fftshift(fft_data)
magnitude_spectrum = np.abs(fft_shifted)
phase = np.angle(fft_shifted)
# Stack channels and convert to PyTorch tensor (C, H, W)
stacked_channels = np.stack([grayscale_img, magnitude_spectrum, phase], axis=0)
# Add a batch dimension (B, C, H, W) and return as float tensor
return torch.tensor(stacked_channels, dtype=torch.float32).unsqueeze(0)
# Load the model from the Hub
model_name = "t0m-R/vit-stm-artifact-fft"
model = AutoModelForImageClassification.from_pretrained(model_name)
# Preprocess
image_path = "path/to/your/stm_image" # Replace with your image path
preprocessed_image = preprocess_for_artifact_detection(image_path)
# Run inference
with torch.no_grad():
logits = model(preprocessed_image).logits
predicted_label_id = logits.argmax(-1).item()
predicted_label = model.config.id2label[predicted_label_id]
print(f"Predicted Label: {predicted_label}")
# Expected output: "Predicted Label: Multi-Tip Artifact"
```
## Preprocessing
**This model will not work with standard image preprocessing.** The input must be a 3-channel tensor representing the grayscale image, FFT amplitude, and FFT phase, as implemented in the function provided in the "How to Use" section.
## Training Data
The model was fine-tuned on a synthetic dataset generated from experimental STM images recorded at CNR-IOM, Trieste. Artifact-free images were transformed into synthetic multi-tip images by summing the clean image with translated and intensity-scaled versions of itself.
## Citation
If you use this model in your research, please cite the original work:
```bibtex
@article{rodani2024enhancing,
title={Enhancing Multi-Tip Artifact Detection in STM Images Using Fourier Transform and Vision Transformers},
author={Rodani, Tommaso and Ansuini, Alessio and Cazziga, Alberto},
journal={Accepted at the 1st Machine Learning for Life and Material Sciences Workshop at ICML},
year={2024}
}
``` |