NP-TEST-0 / README.md
mselmangokmen's picture
Update README.md
c4d7330 verified
|
raw
history blame
14.4 kB
metadata
language:
  - en
tags:
  - vision-transformer
  - dinov2
  - neuropathology
  - image-classification
  - university-of-kentucky
license: apache-2.0
datasets:
  - uky-neuropathology-placeholder
base_model: facebook/dinov2-giant
co2_emissions:
  emissions: 1
  source: Estimated

Model Card for Neuropathology Vision Transformer

This model is a Vision Transformer adapted for neuropathology tasks, developed using data from the University of Kentucky. It leverages principles from self-supervised learning models like DINOv2.

Model Details

  • Model Type: Vision Transformer (ViT) for neuropathology.
  • Developed by: Center for Applied Artificial Intelligence (CAAI)
  • Model Date: 05/05/2025
  • Base Model Architecture: Dinov2-giant (https://huggingface.co/facebook/dinov2-giant)
  • Input: Image (224x224).
  • Output: Class token and patch tokens. These can be used for various downstream tasks (e.g., classification, segmentation, similarity search).
  • Embedding Dimension: 1536
  • Patch Size: 14
  • Image Size Compatibility:
    • The model was trained on images/patches of size 224x224.
    • The model can accept images of any size, not just the 224x224 dimensions used in training.
  • License: [PLACEHOLDER: Reiterate license chosen in YAML, e.g., Apache 2.0. Add link to full license if custom or 'other'.]
  • Repository: [PLACEHOLDER: Link to your model repository (e.g., GitHub, Hugging Face Hub)]
  • Paper(s)/Reference(s):
    • [PLACEHOLDER: Link to your paper if applicable]
    • [Optional: Link to relevant University of Kentucky data descriptor or study paper]
    • Oquab et al., "DINOv2: Learning Robust Visual Features without Supervision" (https://arxiv.org/abs/2304.07193)
    • Darcet et al., "Vision Transformers Need Registers" (https://arxiv.org/abs/2309.16588) (if registers are used)
  • Demo: [PLACEHOLDER: Link to your demo, if any]

Intended Uses

This model is intended for research purposes in the field of neuropathology.

  • Primary Intended Uses:
    • Classification of tissue samples based on the presence/severity of neuropathological changes.
    • Feature extraction for quantitative analysis of neuropathology.

How to Get Started with the Model

[PLACEHOLDER: Provide code snippets for loading and using your model. If available on Hugging Face, show an example using transformers or torch.hub.load.]

Example using Hugging Face transformers (adjust based on your actual model and task):


import torch
from PIL import Image
from transformers import AutoModel, AutoImageProcessor
from torchvision import transforms

def get_embeddings_with_processor(image_path, model_path):
    """
    Extract embeddings using a HuggingFace image processor.
    This approach handles normalization and resizing automatically.
    
    Args:
        image_path: Path to the image file
        model_path: Path to the model directory
        processor_path: Path to the processor config directory
    
    Returns:
        Image embeddings from the model
    """
    # Load model
    model = AutoModel.from_pretrained(model_path)
    model.eval()
    
    # Load processor from config
    image_processor = AutoImageProcessor.from_pretrained(model_path)
    
    # Process the image
    with torch.no_grad():
        image = Image.open(image_path).convert('RGB')
        inputs = image_processor(images=image, return_tensors="pt")
        outputs = model(**inputs)
        embeddings = outputs.last_hidden_state[:, 0, :]
    
    return embeddings

def get_embeddings_direct(image_path, model_path, mean=[0.83800817, 0.6516568, 0.78056043], std=[0.08324149, 0.09973671, 0.07153901]):
    """
    Extract embeddings directly without an image processor.
    This approach works with various image resolutions since transformers handle
    different input sizes by design.
    
    Args:
        image_path: Path to the image file
        model_path: Path to the model directory
        mean: Normalization mean values
        std: Normalization standard deviation values
    
    Returns:
        Image embeddings from the model
    """
    # Load model
    model = AutoModel.from_pretrained(model_path)
    model.eval()
    
    # Define transformation - just converting to tensor and normalizing
    transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize(mean=mean, std=std)
    ])
    
    # Process the image
    with torch.no_grad():
        # Open image and convert to RGB
        image = Image.open(image_path).convert('RGB')
        # Convert image to tensor
        image_tensor = transform(image).unsqueeze(0)  # Add batch dimension
        # Feed to model
        outputs = model(pixel_values=image_tensor)
        # Get embeddings
        embeddings = outputs.last_hidden_state[:, 0, :]
    
    return embeddings

def get_embeddings_resized(image_path, model_path, size=(224, 224), mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]):
    """
    Extract embeddings with explicit resizing to 224x224.
    This approach ensures consistent input size regardless of original image dimensions.
    
    Args:
        image_path: Path to the image file
        model_path: Path to the model directory
        size: Target size for resizing (default: 224x224)
        mean: Normalization mean values
        std: Normalization standard deviation values
    
    Returns:
        Image embeddings from the model
    """
    # Load model
    model = AutoModel.from_pretrained(model_path)
    model.eval()
    
    # Define transformation with explicit resize
    transform = transforms.Compose([
        transforms.Resize(size, interpolation=transforms.InterpolationMode.BICUBIC),
        transforms.ToTensor(),
        transforms.Normalize(mean=mean, std=std)
    ])
    
    # Process the image
    with torch.no_grad():
        image = Image.open(image_path).convert('RGB')
        image_tensor = transform(image).unsqueeze(0)  # Add batch dimension
        outputs = model(pixel_values=image_tensor)
        embeddings = outputs.last_hidden_state[:, 0, :]
    
    return embeddings

# Example usage
if __name__ == "__main__":
    image_path = "test.jpg"
    model_path = "IBI-CAAI/NP-TEST-0" 
    
    # Method 1: Using image processor (recommended for consistency)
    embeddings1 = get_embeddings_with_processor(image_path, model_path)
    print('Embedding shape (with processor):', embeddings1.shape)
     
    # Method 2: Direct approach without resizing (works with various resolutions)
    embeddings2 = get_embeddings_direct(image_path, model_path)
    print('Embedding shape (direct):', embeddings2.shape)
    
    # Method 3: With explicit resize to 224x224
    embeddings3 = get_embeddings_resized(image_path, model_path)
    print('Embedding shape (resized):', embeddings3.shape)

Training Data

  • Dataset(s): The model was trained on data from the University of Kentucky.
    • Name/Identifier: [PLACEHOLDER: Specify the formal name or internal identifier of the dataset, e.g., "UKy Alzheimer's Disease Center Neuropathology Whole Slide Image Cohort v1.0"]
    • Source: University of Kentucky, [PLACEHOLDER: Specific Department, Center, or PI, e.g., Sanders-Brown Center on Aging, Department of Pathology]
    • Description: [PLACEHOLDER: Describe the data. E.g., "Digitized whole slide images (WSIs) of human post-mortem brain tissue sections from [number] subjects. Sections were stained with [e.g., Hematoxylin and Eosin (H&E), and immunohistochemistry for Amyloid-beta (Aβ) and phosphorylated Tau (pTau)]. Images were acquired using [e.g., Aperio AT2 scanner at 20x magnification]."]
    • Preprocessing: [PLACEHOLDER: Describe significant preprocessing steps. E.g., "WSIs were tiled into non-overlapping [e.g., 224x224 pixel] patches. Tiles with excessive background or artifacts were excluded. Color normalization using [Method, e.g., Macenko method] was applied."]
    • Annotation (if applicable for supervised fine-tuning or evaluation): [PLACEHOLDER: Describe the annotation process. E.g., "Regions of interest (ROIs) for [pathologies] were annotated by board-certified neuropathologists. For classification tasks, slide-level or region-level labels for [disease/pathology presence/severity] were provided."]

Training Procedure

  • Training System/Framework: DINO-MX (Modular & Flexible Self-Supervised Training Framework)
  • Base Model (if fine-tuning): Pretrained facebook/dinov2-giant loaded from Hugging Face Hub.
  • Training Objective(s): Self-supervised learning using DINO loss, iBOT masked-image modeling loss.
  • Key Hyperparameters (example):
    • Batch size: 32
    • Learning rate: 1.0e-4
    • Epochs/Iterations: 5000 Iterations
    • Optimizer: AdamW
    • Weight decay: 0.04-0.4

Evaluation

  • Task(s): Classification, KNN, Clustering, Robustness

  • Metrics: Accuracy, Precision, Recall, F1

  • Dataset(s): Neuro Path dataset

  • Results: The model achieved strong performance across multiple evaluation methods using the Neuro Path dataset. The model architecture is based on facebook/dinov2-giant.

    Linear Probe Performance:

    • Accuracy: 80.17%
    • Precision: 79.20%
    • Recall: 79.60%
    • F1 Score: 77.88%

    K-Nearest Neighbors Classification:

    • Accuracy: 83.76%
    • Precision: 83.34%
    • Recall: 83.76%
    • F1 Score: 83.40%

    Clustering Quality:

    • Silhouette Score: 0.267
    • Adjusted Mutual Information: 0.473

    Robustness Score: 0.574

    Overall Performance Score: 0.646

Ethical Considerations

  • Data Usage:
    • [PLACEHOLDER: E.g., "The data from the University of Kentucky used for training and evaluating this model was collected and utilized under Institutional Review Board (IRB) protocol #[XYZ] at the University of Kentucky.", "All data was de-identified prior to its use in this research in accordance with IRB-approved procedures and applicable privacy regulations (e.g., HIPAA)."]
  • Patient Privacy:
    • [PLACEHOLDER: E.g., "Measures were taken to ensure de-identification of patient data. The model outputs do not contain personally identifiable information."]
  • Intended Use Context:
    • This model is intended for research purposes to augment the capabilities of neuropathology researchers. It is not a medical device and should not be used for direct clinical decision-making, diagnosis, or treatment planning without comprehensive validation, regulatory approval (if applicable), and oversight by qualified medical professionals.
  • Fairness and Bias Mitigation:
    • [PLACEHOLDER: Describe any steps taken during development to assess or mitigate bias, or plans for future work in this area. E.g., "Ongoing work includes evaluating model performance across different demographic subgroups represented in the University of Kentucky dataset to identify and address potential disparities."]

Citation / BibTeX

[PLACEHOLDER: If your model is described in a publication, provide its BibTeX entry here.]

@misc{yourlastname_year_modelname,
  author    = {[PLACEHOLDER: Your Name/Group Name, e.g., Doe, John and The University of Kentucky Neuropathology AI Group]},
  title     = {[PLACEHOLDER: Neuropathology Vision Transformer (University of Kentucky Data)]},
  year      = {[PLACEHOLDER: YYYY]},
  publisher = {[PLACEHOLDER: e.g., Hugging Face or arXiv if pre-print, or Journal Name if published]},
  url       = {[PLACEHOLDER: Link to model Hub page or paper]}
}

[Optional: Add BibTeX for the DINOv2 and Vision Transformers Need Registers papers if they are core to your methodology.]

@misc{oquab2023dinov2,
  title={DINOv2: Learning Robust Visual Features without Supervision},
  author={Oquab, Maxime and Darcet, Timothée and Moutakanni, Theo and Vo, Huy and Szafraniec, Marc and Khalidov, Vasil and Fernandez, Pierre and Haziza, Daniel and Massa, Francisco and El-Nouby, Alaaeldin and Howes, Russell and Huang, Po-Yao and Xu, Hu and Sharma, Vasu and Li, Shang-Wen and Galuba, Wojciech and Rabbat, Mike and Assran, Mido and Ballas, Nicolas and Synnaeve, Gabriel and Misra, Ishan and Jegou, Herve and Mairal, Julien and Labatut, Patr