t0m-R
/

vit-stm-artifact-fft

+---
+license: apache-2.0
+language: en
+tags:
+- image-classification
+- vision-transformer
+- pytorch
+- stm
+- materials-science
+- nffa-di
+base_model:
+- google/vit-base-patch16-224-in21k
+pipeline_tag: image-classification
+---
+# Vision Transformer for STM Multi-Tip Artifact Detection
+This is a fine-tuned **Vision Transformer (ViT-B/16)** model for classifying Scanning Tunneling Microscopy (STM) images. It is designed to detect the presence of **multi-tip artifacts**, a common distortion that results in duplicated signals and complicates data interpretation.
+This model was developed as part of the **NFFA-DI (Nano Foundries and Fine Analysis Digital Infrastructure)** project, funded by the European Union's NextGenerationEU program.
+## Model Description
+The model is a `ViT-B/16` pre-trained on ImageNet-21k. It was fine-tuned to classify an STM image as either `Artifact-Free` or `Multi-Tip Artifact`.
+A key feature of this model is its use of a **Fast Fourier Transform (FFT)** based preprocessing method. The model's input is not a standard image but a 3-channel tensor composed of:
+1. The grayscale STM image.
+2. The **amplitude** of the image's Fourier transform.
+3. The **phase** of the image's Fourier transform.
+This approach significantly improves the model's ability to identify the subtle patterns characteristic of multi-tip artifacts.
+## How to Use
+The following Python code shows how to load and use the model for inference.
+```python
+from transformers import AutoModelForImageClassification
+import torch
+# Load the model from the Hub
+model_name = "YourUsername/vit-stm-artifact-fft" # Replace with your repo name
+model = AutoModelForImageClassification.from_pretrained(model_name)
+# NOTE: This model requires a custom FFT-based preprocessing function.
+# The 'preprocessed_image' tensor must have a shape of (1, 3, 224, 224).
+# See the "Preprocessing" section for details.
+# preprocessed_image = your_custom_fft_preprocessing_function("path/to/your/stm_image.tiff")
+# Run inference
+with torch.no_grad():
+    logits = model(preprocessed_image).logits
+    predicted_label_id = logits.argmax(-1).item()
+    predicted_label = model.config.id2label[predicted_label_id]
+print(f"Predicted Label: {predicted_label}")
+# Expected output: "Predicted Label: Multi-Tip Artifact"
+```
+## Preprocessing
+**This model will not work with standard image preprocessing.** The input must be a 3-channel tensor representing the grayscale image, FFT amplitude, and FFT phase. Please refer to the original paper for the exact implementation details. The core steps involve:
+  * Loading the image as grayscale and resizing it to 224x224.
+  * Applying a 2D Fast Fourier Transform (`numpy.fft.fft2`).
+  * Calculating the amplitude (`np.abs`) and phase (`np.angle`).
+  * Normalizing and stacking the three channels into a single tensor.
+## Training Data
+The model was fine-tuned on a synthetic dataset generated from experimental STM images recorded at CNR-IOM, Trieste. Artifact-free images were transformed into synthetic multi-tip images by summing the clean image with translated and intensity-scaled versions of itself.
+## Citation
+If you use this model in your research, please cite the original work:
+```bibtex
+@article{rodani2024enhancing,
+  title={Enhancing Multi-Tip Artifact Detection in STM Images Using Fourier Transform and Vision Transformers},
+  author={Rodani, Tommaso and Ansuini, Alessio and Cazziga, Alberto},
+  journal={Accepted at the 1st Machine Learning for Life and Material Sciences Workshop at ICML},
+  year={2024}
+}
+```

config.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+  "_name_or_path": "google/vit-base-patch16-224-in21k",
+  "architectures": [
+    "ViTForImageClassification"
+  ],
+  "model_type": "vit",
+  "num_labels": 2,
+  "id2label": {
+    "0": "Artifact-Free",
+    "1": "Multi-Tip Artifact"
+  },
+  "label2id": {
+    "Artifact-Free": 0,
+    "Multi-Tip Artifact": 1
+  }
+}

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4d3aaaf677542934b42ab898915c555d07337b4a904bd533eb6f50720a92f8d3
+size 343264618