t0m-R
/

vit-stm-artifact-fft

@@ -37,17 +37,45 @@ This approach significantly improves the model's ability to identify the subtle
 The following Python code shows how to load and use the model for inference.
 ```python
-from transformers import AutoModelForImageClassification
 import torch
 # Load the model from the Hub
 model_name = "t0m-R/vit-stm-artifact-fft"
 model = AutoModelForImageClassification.from_pretrained(model_name)
-# NOTE: This model requires a custom FFT-based preprocessing function.
-# The 'preprocessed_image' tensor must have a shape of (1, 3, 224, 224).
-# See the "Preprocessing" section for details.
-# preprocessed_image = your_custom_fft_preprocessing_function("path/to/your/stm_image")
 # Run inference
 with torch.no_grad():
@@ -61,12 +89,7 @@ print(f"Predicted Label: {predicted_label}")
 ## Preprocessing
-**This model will not work with standard image preprocessing.** The input must be a 3-channel tensor representing the grayscale image, FFT amplitude, and FFT phase. Please refer to the original paper for the exact implementation details. The core steps involve:
-  * Loading the image as grayscale and resizing it to 224x224.
-  * Applying a 2D Fast Fourier Transform (`numpy.fft.fft2`).
-  * Calculating the amplitude (`np.abs`) and phase (`np.angle`).
-  * Normalizing and stacking the three channels into a single tensor.
 ## Training Data

 The following Python code shows how to load and use the model for inference.
 ```python
 import torch
+import numpy as np
+from PIL import Image
+from transformers import AutoModelForImageClassification
+def preprocess_for_artifact_detection(image_path):
+    """
+    Loads an STM image and converts it to the required 3-channel format
+    (grayscale, FFT amplitude, FFT phase) for the model.
+    """
+    # 1. Load and prepare grayscale channel
+    with Image.open(image_path) as img:
+        img = img.convert('L').resize((224, 224))
+        grayscale_img = np.array(img) / 255.0
+    # 2. Compute FFT, Amplitude, and Phase
+    fft_data = np.fft.fft2(grayscale_img)
+    fft_shifted = np.fft.fftshift(fft_data)
+    amplitude = np.log1p(np.abs(fft_shifted))
+    phase = np.angle(fft_shifted)
+    # 3. Normalize channels to be in a 0-1 range
+    amplitude = (amplitude - np.min(amplitude)) / (np.max(amplitude) - np.min(amplitude))
+    phase = (phase - np.min(phase)) / (np.max(phase) - np.min(phase))
+    # 4. Stack channels and convert to PyTorch tensor (C, H, W)
+    stacked_channels = np.stack([grayscale_img, amplitude, phase], axis=0)
+    # 5. Add a batch dimension (B, C, H, W) and return as float tensor
+    return torch.tensor(stacked_channels, dtype=torch.float32).unsqueeze(0)
 # Load the model from the Hub
 model_name = "t0m-R/vit-stm-artifact-fft"
 model = AutoModelForImageClassification.from_pretrained(model_name)
+# Preprocess your image
+image_path = "path/to/your/stm_image" # Replace with your image path
+preprocessed_image = preprocess_for_artifact_detection(image_path)
 # Run inference
 with torch.no_grad():
 ## Preprocessing
+**This model will not work with standard image preprocessing.** The input must be a 3-channel tensor representing the grayscale image, FFT amplitude, and FFT phase, as implemented in the function provided in the "How to Use" section.
 ## Training Data