File size: 4,120 Bytes
91a5362
 
 
 
 
 
 
 
 
 
 
f921783
91a5362
 
 
 
 
f921783
91a5362
 
 
 
 
 
 
f921783
91a5362
 
 
 
 
 
 
 
 
 
 
 
 
 
18cc612
 
 
 
 
 
90df10a
 
18cc612
90df10a
 
 
 
 
 
 
 
 
18cc612
 
 
90df10a
18cc612
 
90df10a
 
18cc612
90df10a
18cc612
91a5362
 
d07aa47
91a5362
 
90df10a
18cc612
 
91a5362
 
 
 
 
 
 
 
 
 
 
 
 
18cc612
91a5362
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
---
license: apache-2.0
language: en
tags:
- image-classification
- vision-transformer
- pytorch
- stm
- materials-science
- nffa-di
base_model:
- google/vit-base-patch32-224-in21k
pipeline_tag: image-classification
---

# Vision Transformer for STM Multi-Tip Artifact Detection

This is a fine-tuned **Vision Transformer (ViT-B/32)** model for classifying Scanning Tunneling Microscopy (STM) images. It is designed to detect the presence of **multi-tip artifacts**, a common distortion that results in duplicated signals and complicates data interpretation.

This model was developed as part of the **NFFA-DI (Nano Foundries and Fine Analysis Digital Infrastructure)** project, funded by the European Union's NextGenerationEU program.



## Model Description 

The model is a `ViT-B/32` pre-trained on ImageNet-21k. It was fine-tuned to classify an STM image as either `Artifact-Free` or `Multi-Tip Artifact`.

A key feature of this model is its use of a **Fast Fourier Transform (FFT)** based preprocessing method. The model's input is not a standard image but a 3-channel tensor composed of:
1. The grayscale STM image.
2. The **amplitude** of the image's Fourier transform.
3. The **phase** of the image's Fourier transform.

This approach significantly improves the model's ability to identify the subtle patterns characteristic of multi-tip artifacts.

## How to Use 

The following Python code shows how to load and use the model for inference.

```python
import torch
import numpy as np
from PIL import Image
from transformers import AutoModelForImageClassification

def preprocess_for_artifact_detection(image_path):
    """
    Loads an STM image and converts it to the required 3-channel format
    (grayscale, magnitude spectrum, phase) for the model.
    """
    try:
        with Image.open(image_path) as img:
            img = img.convert('L').resize((224, 224))
            grayscale_img = np.array(img) / 255.0
    except FileNotFoundError:
        print(f"Error: The file at {image_path} was not found.")
        return None

    # Compute FFT, Magnitude, and Phase 
    fft_data = np.fft.fft2(grayscale_img)
    fft_shifted = np.fft.fftshift(fft_data)
    
    magnitude_spectrum = np.abs(fft_shifted)
    phase = np.angle(fft_shifted)

    # Stack channels and convert to PyTorch tensor (C, H, W)
    stacked_channels = np.stack([grayscale_img, magnitude_spectrum, phase], axis=0)
    
    # Add a batch dimension (B, C, H, W) and return as float tensor
    return torch.tensor(stacked_channels, dtype=torch.float32).unsqueeze(0)

# Load the model from the Hub
model_name = "t0m-R/vit-stm-artifact-fft"
model = AutoModelForImageClassification.from_pretrained(model_name)

# Preprocess 
image_path = "path/to/your/stm_image" # Replace with your image path
preprocessed_image = preprocess_for_artifact_detection(image_path)

# Run inference
with torch.no_grad():
    logits = model(preprocessed_image).logits
    predicted_label_id = logits.argmax(-1).item()
    predicted_label = model.config.id2label[predicted_label_id]

print(f"Predicted Label: {predicted_label}")
# Expected output: "Predicted Label: Multi-Tip Artifact"
```

## Preprocessing 

**This model will not work with standard image preprocessing.** The input must be a 3-channel tensor representing the grayscale image, FFT amplitude, and FFT phase, as implemented in the function provided in the "How to Use" section.

## Training Data 

The model was fine-tuned on a synthetic dataset generated from experimental STM images recorded at CNR-IOM, Trieste. Artifact-free images were transformed into synthetic multi-tip images by summing the clean image with translated and intensity-scaled versions of itself.

## Citation 

If you use this model in your research, please cite the original work:

```bibtex
@article{rodani2024enhancing,
  title={Enhancing Multi-Tip Artifact Detection in STM Images Using Fourier Transform and Vision Transformers},
  author={Rodani, Tommaso and Ansuini, Alessio and Cazziga, Alberto},
  journal={Accepted at the 1st Machine Learning for Life and Material Sciences Workshop at ICML},
  year={2024}
}
```