t0m-R
commited on
Commit
·
91a5362
1
Parent(s):
253a48a
Upload ViT-B/16 STM artifact detection model
Browse files- README.md +86 -3
- config.json +16 -0
- pytorch_model.bin +3 -0
README.md
CHANGED
|
@@ -1,3 +1,86 @@
|
|
| 1 |
-
---
|
| 2 |
-
license:
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language: en
|
| 4 |
+
tags:
|
| 5 |
+
- image-classification
|
| 6 |
+
- vision-transformer
|
| 7 |
+
- pytorch
|
| 8 |
+
- stm
|
| 9 |
+
- materials-science
|
| 10 |
+
- nffa-di
|
| 11 |
+
base_model:
|
| 12 |
+
- google/vit-base-patch16-224-in21k
|
| 13 |
+
pipeline_tag: image-classification
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
# Vision Transformer for STM Multi-Tip Artifact Detection
|
| 17 |
+
|
| 18 |
+
This is a fine-tuned **Vision Transformer (ViT-B/16)** model for classifying Scanning Tunneling Microscopy (STM) images. It is designed to detect the presence of **multi-tip artifacts**, a common distortion that results in duplicated signals and complicates data interpretation.
|
| 19 |
+
|
| 20 |
+
This model was developed as part of the **NFFA-DI (Nano Foundries and Fine Analysis Digital Infrastructure)** project, funded by the European Union's NextGenerationEU program.
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
## Model Description
|
| 25 |
+
|
| 26 |
+
The model is a `ViT-B/16` pre-trained on ImageNet-21k. It was fine-tuned to classify an STM image as either `Artifact-Free` or `Multi-Tip Artifact`.
|
| 27 |
+
|
| 28 |
+
A key feature of this model is its use of a **Fast Fourier Transform (FFT)** based preprocessing method. The model's input is not a standard image but a 3-channel tensor composed of:
|
| 29 |
+
1. The grayscale STM image.
|
| 30 |
+
2. The **amplitude** of the image's Fourier transform.
|
| 31 |
+
3. The **phase** of the image's Fourier transform.
|
| 32 |
+
|
| 33 |
+
This approach significantly improves the model's ability to identify the subtle patterns characteristic of multi-tip artifacts.
|
| 34 |
+
|
| 35 |
+
## How to Use
|
| 36 |
+
|
| 37 |
+
The following Python code shows how to load and use the model for inference.
|
| 38 |
+
|
| 39 |
+
```python
|
| 40 |
+
from transformers import AutoModelForImageClassification
|
| 41 |
+
import torch
|
| 42 |
+
|
| 43 |
+
# Load the model from the Hub
|
| 44 |
+
model_name = "YourUsername/vit-stm-artifact-fft" # Replace with your repo name
|
| 45 |
+
model = AutoModelForImageClassification.from_pretrained(model_name)
|
| 46 |
+
|
| 47 |
+
# NOTE: This model requires a custom FFT-based preprocessing function.
|
| 48 |
+
# The 'preprocessed_image' tensor must have a shape of (1, 3, 224, 224).
|
| 49 |
+
# See the "Preprocessing" section for details.
|
| 50 |
+
# preprocessed_image = your_custom_fft_preprocessing_function("path/to/your/stm_image.tiff")
|
| 51 |
+
|
| 52 |
+
# Run inference
|
| 53 |
+
with torch.no_grad():
|
| 54 |
+
logits = model(preprocessed_image).logits
|
| 55 |
+
predicted_label_id = logits.argmax(-1).item()
|
| 56 |
+
predicted_label = model.config.id2label[predicted_label_id]
|
| 57 |
+
|
| 58 |
+
print(f"Predicted Label: {predicted_label}")
|
| 59 |
+
# Expected output: "Predicted Label: Multi-Tip Artifact"
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
## Preprocessing
|
| 63 |
+
|
| 64 |
+
**This model will not work with standard image preprocessing.** The input must be a 3-channel tensor representing the grayscale image, FFT amplitude, and FFT phase. Please refer to the original paper for the exact implementation details. The core steps involve:
|
| 65 |
+
|
| 66 |
+
* Loading the image as grayscale and resizing it to 224x224.
|
| 67 |
+
* Applying a 2D Fast Fourier Transform (`numpy.fft.fft2`).
|
| 68 |
+
* Calculating the amplitude (`np.abs`) and phase (`np.angle`).
|
| 69 |
+
* Normalizing and stacking the three channels into a single tensor.
|
| 70 |
+
|
| 71 |
+
## Training Data
|
| 72 |
+
|
| 73 |
+
The model was fine-tuned on a synthetic dataset generated from experimental STM images recorded at CNR-IOM, Trieste. Artifact-free images were transformed into synthetic multi-tip images by summing the clean image with translated and intensity-scaled versions of itself.
|
| 74 |
+
|
| 75 |
+
## Citation
|
| 76 |
+
|
| 77 |
+
If you use this model in your research, please cite the original work:
|
| 78 |
+
|
| 79 |
+
```bibtex
|
| 80 |
+
@article{rodani2024enhancing,
|
| 81 |
+
title={Enhancing Multi-Tip Artifact Detection in STM Images Using Fourier Transform and Vision Transformers},
|
| 82 |
+
author={Rodani, Tommaso and Ansuini, Alessio and Cazziga, Alberto},
|
| 83 |
+
journal={Accepted at the 1st Machine Learning for Life and Material Sciences Workshop at ICML},
|
| 84 |
+
year={2024}
|
| 85 |
+
}
|
| 86 |
+
```
|
config.json
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_name_or_path": "google/vit-base-patch16-224-in21k",
|
| 3 |
+
"architectures": [
|
| 4 |
+
"ViTForImageClassification"
|
| 5 |
+
],
|
| 6 |
+
"model_type": "vit",
|
| 7 |
+
"num_labels": 2,
|
| 8 |
+
"id2label": {
|
| 9 |
+
"0": "Artifact-Free",
|
| 10 |
+
"1": "Multi-Tip Artifact"
|
| 11 |
+
},
|
| 12 |
+
"label2id": {
|
| 13 |
+
"Artifact-Free": 0,
|
| 14 |
+
"Multi-Tip Artifact": 1
|
| 15 |
+
}
|
| 16 |
+
}
|
pytorch_model.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4d3aaaf677542934b42ab898915c555d07337b4a904bd533eb6f50720a92f8d3
|
| 3 |
+
size 343264618
|