RSBuilding-Swin-B / README.md

BiliSakura

Upload RSBuilding-Swin-B

f5dec0e verified 6 days ago

preview code

raw

history blame contribute delete

4.5 kB

metadata

license: mit
tags:
  - remote-sensing
  - computer-vision
  - swin-transformer
  - building-extraction
  - change-detection
  - foundation-model
datasets:
  - remote-sensing-images
model-index:
  - name: RSBuilding-Swin-B
    results: []
library_name: transformers
pipeline_tag: feature-extraction

RSBuilding-Swin-B

HuggingFace Transformers version of RSBuilding Swin-Base model, converted from MMDetection/MMSegmentation format.

Source

Source Code: https://github.com/Meize0729/RSBuilding
Original Checkpoint: https://huggingface.co/models/BiliSakura/RSBuilding

Model Information

Architecture: Swin Transformer Base
Embedding Dimension: 128
Depths: [2, 2, 18, 2]
Number of Heads: [4, 8, 16, 32]
Window Size: 12
Image Size: 384×384
Patch Size: 4×4

Important Notes

Missing Buffer Keys (Expected)

When loading this model, you may see messages about missing buffer keys (typically ~12 keys). This is expected and normal.

These missing keys are buffers that are computed dynamically during model initialization:

relative_position_index: Precomputed index mapping for window-based attention
relative_coords_table: Precomputed coordinate table for relative positions
relative_position_bias_table: Precomputed bias table

Why they're missing:

These buffers are recalculated each time the model is instantiated based on window_size and other configuration parameters
They don't need to be saved in checkpoints because they're deterministic and computed from config
This is standard behavior in HuggingFace Swin transformers

Action required: None. The model will work correctly with these buffers computed automatically.

Quick Start

Installation

pip install transformers torch pillow

Inference Example

from transformers import SwinModel, AutoImageProcessor
from PIL import Image
import torch

# Load model and processor
model = SwinModel.from_pretrained("BiliSakura/RSBuilding-Swin-B")
processor = AutoImageProcessor.from_pretrained("BiliSakura/RSBuilding-Swin-B")

# Load and process image
image = Image.open("your_image.jpg")
inputs = processor(image, return_tensors="pt")

# Forward pass
with torch.no_grad():
    outputs = model(**inputs)

# Get features
# outputs.last_hidden_state: (batch_size, num_patches, hidden_size)
# outputs.pooler_output: (batch_size, hidden_size) - pooled representation
features = outputs.last_hidden_state
pooled_features = outputs.pooler_output

print(f"Feature shape: {features.shape}")
print(f"Pooled feature shape: {pooled_features.shape}")

Feature Extraction for Downstream Tasks

from transformers import SwinModel, AutoImageProcessor
import torch

model = SwinModel.from_pretrained("BiliSakura/RSBuilding-Swin-B")
processor = AutoImageProcessor.from_pretrained("BiliSakura/RSBuilding-Swin-B")

# Process image
image = Image.open("your_image.jpg")
inputs = processor(image, return_tensors="pt")

# Extract features
with torch.no_grad():
    outputs = model(**inputs)
    
# Use pooled features for classification/regression
features = outputs.pooler_output  # Shape: (1, 1024)

# Or use last hidden state for dense prediction tasks
spatial_features = outputs.last_hidden_state  # Shape: (1, num_patches, 1024)

Model Configuration

The model uses the following configuration:

image_size: 384
patch_size: 4
num_channels: 3
embed_dim: 128
depths: [2, 2, 18, 2]
num_heads: [4, 8, 16, 32]
window_size: 12
mlp_ratio: 4.0
hidden_act: "gelu"

Citation

If you use this model, please cite the original RSBuilding paper:

@article{wangRSBuildingGeneralRemote2024a,
  title = {{{RSBuilding}}: {{Toward General Remote Sensing Image Building Extraction}} and {{Change Detection With Foundation Model}}},
  shorttitle = {{{RSBuilding}}},
  author = {Wang, Mingze and Su, Lili and Yan, Cilin and Xu, Sheng and Yuan, Pengcheng and Jiang, Xiaolong and Zhang, Baochang},
  year = {2024},
  journal = {IEEE Transactions on Geoscience and Remote Sensing},
  volume = {62},
  pages = {1--17},
  issn = {1558-0644},
  doi = {10.1109/TGRS.2024.3439395},
  keywords = {Building extraction,Buildings,change detection (CD),Data mining,Feature extraction,federated training,foundation model,Image segmentation,Remote sensing,remote sensing images,Task analysis,Training}
}