BiliSakura commited on
Commit
03e2168
·
verified ·
1 Parent(s): 2f900f6

Upload RSBuilding-Swin-T

Browse files
Files changed (4) hide show
  1. README.md +147 -0
  2. config.json +50 -0
  3. model.safetensors +3 -0
  4. preprocessor_config.json +21 -0
README.md ADDED
@@ -0,0 +1,147 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - remote-sensing
5
+ - computer-vision
6
+ - swin-transformer
7
+ - building-extraction
8
+ - change-detection
9
+ - foundation-model
10
+ datasets:
11
+ - remote-sensing-images
12
+ model-index:
13
+ - name: RSBuilding-Swin-T
14
+ results: []
15
+ library_name: transformers
16
+ pipeline_tag: feature-extraction
17
+ ---
18
+
19
+ # RSBuilding-Swin-T
20
+
21
+ HuggingFace Transformers version of RSBuilding Swin-Tiny model, converted from MMDetection/MMSegmentation format.
22
+
23
+ ## Source
24
+
25
+ - **Source Code**: [https://github.com/Meize0729/RSBuilding](https://github.com/Meize0729/RSBuilding)
26
+ - **Original Checkpoint**: [https://huggingface.co/models/BiliSakura/RSBuilding](https://huggingface.co/models/BiliSakura/RSBuilding)
27
+
28
+ ## Model Information
29
+
30
+ - **Architecture**: Swin Transformer Tiny
31
+ - **Embedding Dimension**: 96
32
+ - **Depths**: [2, 2, 6, 2]
33
+ - **Number of Heads**: [3, 6, 12, 24]
34
+ - **Window Size**: 7
35
+ - **Image Size**: 224×224
36
+ - **Patch Size**: 4×4
37
+
38
+ ## Important Notes
39
+
40
+ ### Missing Buffer Keys (Expected)
41
+
42
+ When loading this model, you may see messages about missing buffer keys (typically ~12 keys). **This is expected and normal.**
43
+
44
+ These missing keys are buffers that are computed dynamically during model initialization:
45
+ - `relative_position_index`: Precomputed index mapping for window-based attention
46
+ - `relative_coords_table`: Precomputed coordinate table for relative positions
47
+ - `relative_position_bias_table`: Precomputed bias table
48
+
49
+ **Why they're missing:**
50
+ - These buffers are recalculated each time the model is instantiated based on `window_size` and other configuration parameters
51
+ - They don't need to be saved in checkpoints because they're deterministic and computed from config
52
+ - This is standard behavior in HuggingFace Swin transformers
53
+
54
+ **Action required:** None. The model will work correctly with these buffers computed automatically.
55
+
56
+ ## Quick Start
57
+
58
+ ### Installation
59
+
60
+ ```bash
61
+ pip install transformers torch pillow
62
+ ```
63
+
64
+ ### Inference Example
65
+
66
+ ```python
67
+ from transformers import SwinModel, AutoImageProcessor
68
+ from PIL import Image
69
+ import torch
70
+
71
+ # Load model and processor
72
+ model = SwinModel.from_pretrained("BiliSakura/RSBuilding-Swin-T")
73
+ processor = AutoImageProcessor.from_pretrained("BiliSakura/RSBuilding-Swin-T")
74
+
75
+ # Load and process image
76
+ image = Image.open("your_image.jpg")
77
+ inputs = processor(image, return_tensors="pt")
78
+
79
+ # Forward pass
80
+ with torch.no_grad():
81
+ outputs = model(**inputs)
82
+
83
+ # Get features
84
+ # outputs.last_hidden_state: (batch_size, num_patches, hidden_size)
85
+ # outputs.pooler_output: (batch_size, hidden_size) - pooled representation
86
+ features = outputs.last_hidden_state
87
+ pooled_features = outputs.pooler_output
88
+
89
+ print(f"Feature shape: {features.shape}")
90
+ print(f"Pooled feature shape: {pooled_features.shape}")
91
+ ```
92
+
93
+ ### Feature Extraction for Downstream Tasks
94
+
95
+ ```python
96
+ from transformers import SwinModel, AutoImageProcessor
97
+ import torch
98
+
99
+ model = SwinModel.from_pretrained("BiliSakura/RSBuilding-Swin-T")
100
+ processor = AutoImageProcessor.from_pretrained("BiliSakura/RSBuilding-Swin-T")
101
+
102
+ # Process image
103
+ image = Image.open("your_image.jpg")
104
+ inputs = processor(image, return_tensors="pt")
105
+
106
+ # Extract features
107
+ with torch.no_grad():
108
+ outputs = model(**inputs)
109
+
110
+ # Use pooled features for classification/regression
111
+ features = outputs.pooler_output # Shape: (1, 768)
112
+
113
+ # Or use last hidden state for dense prediction tasks
114
+ spatial_features = outputs.last_hidden_state # Shape: (1, num_patches, 768)
115
+ ```
116
+
117
+ ## Model Configuration
118
+
119
+ The model uses the following configuration:
120
+ - `image_size`: 224
121
+ - `patch_size`: 4
122
+ - `num_channels`: 3
123
+ - `embed_dim`: 96
124
+ - `depths`: [2, 2, 6, 2]
125
+ - `num_heads`: [3, 6, 12, 24]
126
+ - `window_size`: 7
127
+ - `mlp_ratio`: 4.0
128
+ - `hidden_act`: "gelu"
129
+
130
+ ## Citation
131
+
132
+ If you use this model, please cite the original RSBuilding paper:
133
+
134
+ ```bibtex
135
+ @article{wangRSBuildingGeneralRemote2024a,
136
+ title = {{{RSBuilding}}: {{Toward General Remote Sensing Image Building Extraction}} and {{Change Detection With Foundation Model}}},
137
+ shorttitle = {{{RSBuilding}}},
138
+ author = {Wang, Mingze and Su, Lili and Yan, Cilin and Xu, Sheng and Yuan, Pengcheng and Jiang, Xiaolong and Zhang, Baochang},
139
+ year = {2024},
140
+ journal = {IEEE Transactions on Geoscience and Remote Sensing},
141
+ volume = {62},
142
+ pages = {1--17},
143
+ issn = {1558-0644},
144
+ doi = {10.1109/TGRS.2024.3439395},
145
+ keywords = {Building extraction,Buildings,change detection (CD),Data mining,Feature extraction,federated training,foundation model,Image segmentation,Remote sensing,remote sensing images,Task analysis,Training}
146
+ }
147
+ ```
config.json ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "SwinModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.0,
6
+ "depths": [
7
+ 2,
8
+ 2,
9
+ 6,
10
+ 2
11
+ ],
12
+ "drop_path_rate": 0.15,
13
+ "dtype": "float32",
14
+ "embed_dim": 96,
15
+ "encoder_stride": 32,
16
+ "hidden_act": "gelu",
17
+ "hidden_dropout_prob": 0.0,
18
+ "hidden_size": 768,
19
+ "image_size": 224,
20
+ "initializer_range": 0.02,
21
+ "layer_norm_eps": 1e-05,
22
+ "mlp_ratio": 4.0,
23
+ "model_type": "swin",
24
+ "num_channels": 3,
25
+ "num_heads": [
26
+ 3,
27
+ 6,
28
+ 12,
29
+ 24
30
+ ],
31
+ "num_layers": 4,
32
+ "out_features": [
33
+ "stage4"
34
+ ],
35
+ "out_indices": [
36
+ 4
37
+ ],
38
+ "patch_size": 4,
39
+ "qkv_bias": true,
40
+ "stage_names": [
41
+ "stem",
42
+ "stage1",
43
+ "stage2",
44
+ "stage3",
45
+ "stage4"
46
+ ],
47
+ "transformers_version": "5.0.0.dev0",
48
+ "use_absolute_embeddings": false,
49
+ "window_size": 7
50
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8fc0eb4d63091ee665bdbb8a0c865d1716b96cf8cc1ba1ce60d8b60efd6241ab
3
+ size 110335368
preprocessor_config.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_resize": true,
3
+ "size": {
4
+ "height": 224,
5
+ "width": 224
6
+ },
7
+ "resample": 3,
8
+ "do_rescale": true,
9
+ "rescale_factor": 0.00392156862745098,
10
+ "do_normalize": true,
11
+ "image_mean": [
12
+ 0.485,
13
+ 0.456,
14
+ 0.406
15
+ ],
16
+ "image_std": [
17
+ 0.229,
18
+ 0.224,
19
+ 0.225
20
+ ]
21
+ }