metadata
license: mit
tags:
- remote-sensing
- computer-vision
- swin-transformer
- building-extraction
- change-detection
- foundation-model
datasets:
- remote-sensing-images
model-index:
- name: RSBuilding-Swin-B
results: []
library_name: transformers
pipeline_tag: feature-extraction
RSBuilding-Swin-B
HuggingFace Transformers version of RSBuilding Swin-Base model, converted from MMDetection/MMSegmentation format.
Source
- Source Code: https://github.com/Meize0729/RSBuilding
- Original Checkpoint: https://huggingface.co/models/BiliSakura/RSBuilding
Model Information
- Architecture: Swin Transformer Base
- Embedding Dimension: 128
- Depths: [2, 2, 18, 2]
- Number of Heads: [4, 8, 16, 32]
- Window Size: 12
- Image Size: 384×384
- Patch Size: 4×4
Important Notes
Missing Buffer Keys (Expected)
When loading this model, you may see messages about missing buffer keys (typically ~12 keys). This is expected and normal.
These missing keys are buffers that are computed dynamically during model initialization:
relative_position_index: Precomputed index mapping for window-based attentionrelative_coords_table: Precomputed coordinate table for relative positionsrelative_position_bias_table: Precomputed bias table
Why they're missing:
- These buffers are recalculated each time the model is instantiated based on
window_sizeand other configuration parameters - They don't need to be saved in checkpoints because they're deterministic and computed from config
- This is standard behavior in HuggingFace Swin transformers
Action required: None. The model will work correctly with these buffers computed automatically.
Quick Start
Installation
pip install transformers torch pillow
Inference Example
from transformers import SwinModel, AutoImageProcessor
from PIL import Image
import torch
# Load model and processor
model = SwinModel.from_pretrained("BiliSakura/RSBuilding-Swin-B")
processor = AutoImageProcessor.from_pretrained("BiliSakura/RSBuilding-Swin-B")
# Load and process image
image = Image.open("your_image.jpg")
inputs = processor(image, return_tensors="pt")
# Forward pass
with torch.no_grad():
outputs = model(**inputs)
# Get features
# outputs.last_hidden_state: (batch_size, num_patches, hidden_size)
# outputs.pooler_output: (batch_size, hidden_size) - pooled representation
features = outputs.last_hidden_state
pooled_features = outputs.pooler_output
print(f"Feature shape: {features.shape}")
print(f"Pooled feature shape: {pooled_features.shape}")
Feature Extraction for Downstream Tasks
from transformers import SwinModel, AutoImageProcessor
import torch
model = SwinModel.from_pretrained("BiliSakura/RSBuilding-Swin-B")
processor = AutoImageProcessor.from_pretrained("BiliSakura/RSBuilding-Swin-B")
# Process image
image = Image.open("your_image.jpg")
inputs = processor(image, return_tensors="pt")
# Extract features
with torch.no_grad():
outputs = model(**inputs)
# Use pooled features for classification/regression
features = outputs.pooler_output # Shape: (1, 1024)
# Or use last hidden state for dense prediction tasks
spatial_features = outputs.last_hidden_state # Shape: (1, num_patches, 1024)
Model Configuration
The model uses the following configuration:
image_size: 384patch_size: 4num_channels: 3embed_dim: 128depths: [2, 2, 18, 2]num_heads: [4, 8, 16, 32]window_size: 12mlp_ratio: 4.0hidden_act: "gelu"
Citation
If you use this model, please cite the original RSBuilding paper:
@article{wangRSBuildingGeneralRemote2024a,
title = {{{RSBuilding}}: {{Toward General Remote Sensing Image Building Extraction}} and {{Change Detection With Foundation Model}}},
shorttitle = {{{RSBuilding}}},
author = {Wang, Mingze and Su, Lili and Yan, Cilin and Xu, Sheng and Yuan, Pengcheng and Jiang, Xiaolong and Zhang, Baochang},
year = {2024},
journal = {IEEE Transactions on Geoscience and Remote Sensing},
volume = {62},
pages = {1--17},
issn = {1558-0644},
doi = {10.1109/TGRS.2024.3439395},
keywords = {Building extraction,Buildings,change detection (CD),Data mining,Feature extraction,federated training,foundation model,Image segmentation,Remote sensing,remote sensing images,Task analysis,Training}
}