t0m-R

Update README.md

90df10a 5 months ago

4.12 kB

	---
	license: apache-2.0
	language: en
	tags:
	- image-classification
	- vision-transformer
	- pytorch
	- stm
	- materials-science
	- nffa-di
	base_model:
	- google/vit-base-patch32-224-in21k
	pipeline_tag: image-classification
	---

	# Vision Transformer for STM Multi-Tip Artifact Detection

	This is a fine-tuned Vision Transformer (ViT-B/32) model for classifying Scanning Tunneling Microscopy (STM) images. It is designed to detect the presence of multi-tip artifacts, a common distortion that results in duplicated signals and complicates data interpretation.

	This model was developed as part of the NFFA-DI (Nano Foundries and Fine Analysis Digital Infrastructure) project, funded by the European Union's NextGenerationEU program.



	## Model Description

	The model is a `ViT-B/32` pre-trained on ImageNet-21k. It was fine-tuned to classify an STM image as either `Artifact-Free` or `Multi-Tip Artifact`.

	A key feature of this model is its use of a Fast Fourier Transform (FFT) based preprocessing method. The model's input is not a standard image but a 3-channel tensor composed of:
	1. The grayscale STM image.
	2. The amplitude of the image's Fourier transform.
	3. The phase of the image's Fourier transform.

	This approach significantly improves the model's ability to identify the subtle patterns characteristic of multi-tip artifacts.

	## How to Use

	The following Python code shows how to load and use the model for inference.

	```python
	import torch
	import numpy as np
	from PIL import Image
	from transformers import AutoModelForImageClassification

	def preprocess_for_artifact_detection(image_path):
	"""
	Loads an STM image and converts it to the required 3-channel format
	(grayscale, magnitude spectrum, phase) for the model.
	"""
	try:
	with Image.open(image_path) as img:
	img = img.convert('L').resize((224, 224))
	grayscale_img = np.array(img) / 255.0
	except FileNotFoundError:
	print(f"Error: The file at {image_path} was not found.")
	return None

	# Compute FFT, Magnitude, and Phase
	fft_data = np.fft.fft2(grayscale_img)
	fft_shifted = np.fft.fftshift(fft_data)

	magnitude_spectrum = np.abs(fft_shifted)
	phase = np.angle(fft_shifted)

	# Stack channels and convert to PyTorch tensor (C, H, W)
	stacked_channels = np.stack([grayscale_img, magnitude_spectrum, phase], axis=0)

	# Add a batch dimension (B, C, H, W) and return as float tensor
	return torch.tensor(stacked_channels, dtype=torch.float32).unsqueeze(0)

	# Load the model from the Hub
	model_name = "t0m-R/vit-stm-artifact-fft"
	model = AutoModelForImageClassification.from_pretrained(model_name)

	# Preprocess
	image_path = "path/to/your/stm_image" # Replace with your image path
	preprocessed_image = preprocess_for_artifact_detection(image_path)

	# Run inference
	with torch.no_grad():
	logits = model(preprocessed_image).logits
	predicted_label_id = logits.argmax(-1).item()
	predicted_label = model.config.id2label[predicted_label_id]

	print(f"Predicted Label: {predicted_label}")
	# Expected output: "Predicted Label: Multi-Tip Artifact"
	```

	## Preprocessing

	This model will not work with standard image preprocessing. The input must be a 3-channel tensor representing the grayscale image, FFT amplitude, and FFT phase, as implemented in the function provided in the "How to Use" section.

	## Training Data

	The model was fine-tuned on a synthetic dataset generated from experimental STM images recorded at CNR-IOM, Trieste. Artifact-free images were transformed into synthetic multi-tip images by summing the clean image with translated and intensity-scaled versions of itself.

	## Citation

	If you use this model in your research, please cite the original work:

	```bibtex
	@article{rodani2024enhancing,
	title={Enhancing Multi-Tip Artifact Detection in STM Images Using Fourier Transform and Vision Transformers},
	author={Rodani, Tommaso and Ansuini, Alessio and Cazziga, Alberto},
	journal={Accepted at the 1st Machine Learning for Life and Material Sciences Workshop at ICML},
	year={2024}
	}
	```