---
license: apache-2.0
language: en
tags:
- image-classification
- vision-transformer
- pytorch
- stm
- materials-science
- nffa-di
base_model:
- google/vit-base-patch32-224-in21k
pipeline_tag: image-classification
---

# Vision Transformer for STM Multi-Tip Artifact Detection

This is a fine-tuned **Vision Transformer (ViT-B/32)** model for classifying Scanning Tunneling Microscopy (STM) images. It is designed to detect the presence of **multi-tip artifacts**, a common distortion that results in duplicated signals and complicates data interpretation.

This model was developed as part of the **NFFA-DI (Nano Foundries and Fine Analysis Digital Infrastructure)** project, funded by the European Union's NextGenerationEU program.


## Model Description 

The model is a `ViT-B/32` pre-trained on ImageNet-21k. It was fine-tuned to classify an STM image as either `Artifact-Free` or `Multi-Tip Artifact`.

A key feature of this model is its use of a **Fast Fourier Transform (FFT)** based preprocessing method. The model's input is not a standard image but a 3-channel tensor composed of:
1. The grayscale STM image.
2. The **amplitude** of the image's Fourier transform.
3. The **phase** of the image's Fourier transform.

This approach significantly improves the model's ability to identify the subtle patterns characteristic of multi-tip artifacts.

## How to Use 

The following Python code shows how to load and use the model for inference.

```python
import torch
import numpy as np
from PIL import Image
from transformers import AutoModelForImageClassification

def preprocess_for_artifact_detection(image_path):
    """
    Loads an STM image and converts it to the required 3-channel format
    (grayscale, magnitude spectrum, phase) for the model.
    """
    try:
        with Image.open(image_path) as img:
            img = img.convert('L').resize((224, 224))
            grayscale_img = np.array(img) / 255.0
    except FileNotFoundError:
        print(f"Error: The file at {image_path} was not found.")
        return None

    # Compute FFT, Magnitude, and Phase 
    fft_data = np.fft.fft2(grayscale_img)
    fft_shifted = np.fft.fftshift(fft_data)
    
    magnitude_spectrum = np.abs(fft_shifted)
    phase = np.angle(fft_shifted)

    # Stack channels and convert to PyTorch tensor (C, H, W)
    stacked_channels = np.stack([grayscale_img, magnitude_spectrum, phase], axis=0)
    
    # Add a batch dimension (B, C, H, W) and return as float tensor
    return torch.tensor(stacked_channels, dtype=torch.float32).unsqueeze(0)

# Load the model from the Hub
model_name = "t0m-R/vit-stm-artifact-fft"
model = AutoModelForImageClassification.from_pretrained(model_name)

# Preprocess 
image_path = "path/to/your/stm_image" # Replace with your image path
preprocessed_image = preprocess_for_artifact_detection(image_path)

# Run inference
with torch.no_grad():
    logits = model(preprocessed_image).logits
    predicted_label_id = logits.argmax(-1).item()
    predicted_label = model.config.id2label[predicted_label_id]

print(f"Predicted Label: {predicted_label}")
# Expected output: "Predicted Label: Multi-Tip Artifact"
```

## Preprocessing 

**This model will not work with standard image preprocessing.** The input must be a 3-channel tensor representing the grayscale image, FFT amplitude, and FFT phase, as implemented in the function provided in the "How to Use" section.

## Training Data 

The model was fine-tuned on a synthetic dataset generated from experimental STM images recorded at CNR-IOM, Trieste. Artifact-free images were transformed into synthetic multi-tip images by summing the clean image with translated and intensity-scaled versions of itself.

## Citation 

If you use this model in your research, please cite the original work:

```bibtex
@article{rodani2024enhancing,
  title={Enhancing Multi-Tip Artifact Detection in STM Images Using Fourier Transform and Vision Transformers},
  author={Rodani, Tommaso and Ansuini, Alessio and Cazziga, Alberto},
  journal={Accepted at the 1st Machine Learning for Life and Material Sciences Workshop at ICML},
  year={2024}
}
```