MoCo-TP-ResNet-50

ResNet-50 model pre-trained using MoCo-v2 with Temporal Pairing (TP) for geography-aware self-supervised learning on remote sensing images.

Model Details

Architecture: ResNet-50
Pre-training: MoCo-v2 with Temporal Pairing (TP)
Input size: 224×224×3
Feature dimension: 2048 (before classification head)
Parameters: ~23.6M
Training: Self-supervised pre-training on fMoW dataset (200 epochs)

Usage

Feature Extraction

from transformers import AutoModelForImageClassification
import torch

# Load model for feature extraction
model = AutoModelForImageClassification.from_pretrained(
    "BiliSakura/MoCo-TP-ResNet-50",
    trust_remote_code=True
)

# Inference - extract features
model.eval()
input_image = torch.randn(1, 3, 224, 224)  # (batch, channels, height, width)

with torch.no_grad():
    outputs = model(pixel_values=input_image, return_dict=True)
    features = outputs["features"]  # Shape: (1, 2048)

Fine-tuning for Classification

To fine-tune the model for a specific classification task, you can add a classification head:

from transformers import AutoModelForImageClassification, AutoConfig
import torch.nn as nn

# Load config and modify num_labels
config = AutoConfig.from_pretrained(
    "BiliSakura/MoCo-TP-ResNet-50",
    trust_remote_code=True
)
config.num_labels = 10  # Your number of classes

# Load model
model = AutoModelForImageClassification.from_pretrained(
    "BiliSakura/MoCo-TP-ResNet-50",
    config=config,
    trust_remote_code=True
)

# The model will automatically replace the identity head with a classification head
# Now you can fine-tune on your dataset

Model Architecture

The model consists of:

Backbone: ResNet-50 (conv1, bn1, layer1-4)
Feature extractor: Adaptive average pooling + flattening
Classification head: Linear layer (2048 -> num_labels), or Identity for feature extraction

Pre-training Details

This model was pre-trained using:

Method: MoCo-v2 (Momentum Contrast) with Temporal Pairing
Dataset: fMoW (Functional Map of the World)
Epochs: 200
Loss: Contrastive Predictive Coding (CPC)
Augmentation: MoCo v2 augmentation (random resized crop, color jitter, grayscale, Gaussian blur)

Citation

If you use this model, please cite the original Geography-Aware SSL paper:

@article{ayush2021geography,
    title={Geography-Aware Self-Supervised Learning},
    author={Ayush, Kumar and Uzkent, Burak and Meng, Chenlin and Tanmay, Kumar and Burke, Marshall and Lobell, David and Ermon, Stefano},
    journal={ICCV},
    year={2021}
}

Original Repository: sustainlab-group/geography-aware-ssl

License

MIT License - for academic use only.

Downloads last month: 10