README.md · BiliSakura/SkySensepp at main

SkySensepp / README.md

BiliSakura

Update all files for SkySensepp

2322b44 verified about 1 month ago

preview code

raw

history blame contribute delete

7.52 kB

	---
	language: en
	license: apache-2.0
	datasets:
	- zenodo
	- huggingface
	tags:
	- earth-observation
	- remote-sensing
	- segmentation
	- multi-modal
	- optical
	- sentinel-1
	- sentinel-2
	- computer-vision
	pipeline_tag: image-segmentation
	inference: true
	---

	# SkySense++

	SkySense++ is a semantic-enhanced multi-modal remote sensing foundation model for Earth observation. It fuses high-resolution optical imagery (HR), Sentinel-2 (S2), and Sentinel-1 SAR (S1) through independent backbones, an optional modality-completion VAE, and a shared transformer fusion encoder.

	Primary use: representation extraction. The pretrained backbones produce rich feature representations for downstream tasks (classification, segmentation, regression). Extract `features_hr`, `features_s2`, `features_s1`, or `features_fusion` and feed them to your task-specific head. Fine-tuning on your target dataset is required. See the main [SkySensePlusPlus](https://github.com/zqcrafts/SkySense-O) repository for pretraining, 1-shot, and finetuning workflows.

	## Model Metadata

	\| Attribute \| Value \|
	\|-----------\|-------\|
	\| Model type \| Multi-modal segmentation (HR + S2 + S1) \|
	\| Paper \| [SkySense++: A Semantic-Enhanced Multi-Modal Remote Sensing Foundation Model Beyond SkySense for Earth Observation](https://www.nature.com/articles/s42256-025-01078-8) \|
	\| Publication \| Nature Machine Intelligence, 2025 \|
	\| License \| Apache-2.0 \|
	\| Input modalities \| High-resolution optical, Sentinel-2, Sentinel-1 \|
	\| Output \| Semantic segmentation (65 classes) \|
	\| Checkpoint contents \| Backbone weights only; segmentation head not pretrained \|
	\| HR input size \| 512×512 \|
	\| S2/S1 patch size \| 16×16 \|

	## Model Variants

	\| Variant \| Path \| Sources \| Use Modal VAE \| Description \|
	\|---------\|------\|---------\|---------------\|-------------\|
	\| full (default) \| `.` \| hr, s2, s1 \| Yes \| All three modalities with VAE completion \|
	\| hr \| `hr/` \| hr \| No \| High-resolution optical only \|
	\| s2 \| `s2/` \| s2 \| No \| Sentinel-2 only \|
	\| s1 \| `s1/` \| s1 \| No \| Sentinel-1 only \|

	### Repository structure (full variant, diffusers layout)

	```
	.
	├── config.json
	├── model.safetensors
	├── modality_vae/ # VAE subfolder (diffusers standard)
	│ ├── config.json
	│ └── diffusion_pytorch_model.safetensors
	├── modeling_skysensepp.py
	├── configuration_skysensepp.py
	├── pipeline_skysensepp.py
	├── sky_sensepp_impl/ # ModalityCompletionVAE, ModalityCompletionVAEPipeline in necks/
	├── hr/, s2/, s1/ # Single-modality variants
	```

	VAE loads automatically from `modality_vae/` subfolder. Legacy `modality_vae.safetensors` at root is also supported. Migrate with:
	`python tools/split_vae_from_checkpoint.py --model-dir path/to/model --migrate`

	## Installation

	```bash
	pip install transformers torch safetensors diffusers
	```

	The modality VAE uses diffusers `VQModel`. Legacy checkpoints (ConvVQVAEv2) load via backward-compatible fallback.

	## Usage

	### Diffusers-style loading and inference

	The VAE follows the [diffusers](https://huggingface.co/docs/diffusers) layout: model in a `modality_vae/` subfolder with `config.json` and `diffusion_pytorch_model.safetensors`. Load and run inference like this:

	```python
	import torch
	from transformers import AutoModel

	# Load full model (VAE auto-loads from modality_vae/ subfolder, diffusers-style)
	model = AutoModel.from_pretrained("path/to/SkySensepp", trust_remote_code=True)
	model = model.eval().to("cuda")

	# Prepare inputs
	hr_img = torch.randn(1, 3, 512, 512, device="cuda")
	s2_img = torch.randn(1, 10, 2, 256, 256, device="cuda") # B, 10 bands, S steps, H, W
	s1_img = torch.randn(1, 2, 2, 256, 256, device="cuda") # B, 2 bands, S steps, H, W
	modalities = torch.ones(1, 3, dtype=torch.bool, device="cuda") # [hr, s2, s1] present

	# Inference
	with torch.no_grad():
	out = model(
	hr_img=hr_img,
	s2_img=s2_img,
	s1_img=s1_img,
	modality_flag_hr=modalities[:, :1],
	modality_flag_s2=modalities[:, 1:2],
	modality_flag_s1=modalities[:, 2:],
	return_features=True,
	)

	features_fusion = out["features_fusion"]
	logits_hr = out.get("logits_hr")
	```

	### Load VAE component only (diffusers-style)

	```python
	from sky_sensepp_impl.necks import ModalityCompletionVAE

	# Load VAE from subfolder (same pattern as diffusers Stable Diffusion VAE)
	vae = ModalityCompletionVAE.from_pretrained(
	"path/to/SkySensepp",
	subfolder="modality_vae",
	)
	vae = vae.eval().to("cuda")

	# Run modality completion on backbone features (e.g. 2816-d, 16×16)
	feat_hr = torch.randn(1, 2816, 16, 16, device="cuda")
	feat_s2 = torch.randn(1, 2816, 16, 16, device="cuda")
	feat_s1 = torch.randn(1, 2816, 16, 16, device="cuda")
	modality_info = torch.ones(1, 3, dtype=torch.bool, device="cuda")

	with torch.no_grad():
	out = vae(feat_hr, feat_s2, feat_s1, modality_info)

	hr_out = out["hr_out"]
	s2_out = out["s2_out"]
	s1_out = out["s1_out"]
	```

	### ModalityCompletionVAEPipeline (modular, diffusers-style)

	```python
	from sky_sensepp_impl.necks import ModalityCompletionVAE, ModalityCompletionVAEPipeline

	# Load pipeline (VAE from modality_vae/ subfolder)
	pipe = ModalityCompletionVAEPipeline.from_pretrained(
	"path/to/SkySensepp",
	subfolder="modality_vae",
	)
	pipe = pipe.to("cuda")

	# Inference on features
	out = pipe(
	feat_hr=feat_hr,
	feat_s2=feat_s2,
	feat_s1=feat_s1,
	modality_info=modality_info,
	)
	hr_out, s2_out, s1_out = out["hr_out"], out["s2_out"], out["s1_out"]

	# Modular: inject custom VAE
	custom_vae = ModalityCompletionVAE.from_pretrained("path/to/custom_vae")
	pipe = ModalityCompletionVAEPipeline.from_pretrained("path/to/SkySensepp", vae=custom_vae)

	# Or swap components after load
	pipe.register_components(vae=custom_vae)
	```

	### Load model and attach VAE manually

	```python
	model = AutoModel.from_pretrained("path/to/SkySensepp", trust_remote_code=True)
	model.load_vae(
	pretrained_model_name_or_path="path/to/SkySensepp",
	subfolder="modality_vae",
	)
	```

	### Variants (single-modality, no VAE)

	```python
	model = AutoModel.from_pretrained("path/to/SkySensepp/hr", trust_remote_code=True)
	model = AutoModel.from_pretrained("path/to/SkySensepp/s2", trust_remote_code=True)
	model = AutoModel.from_pretrained("path/to/SkySensepp/s1", trust_remote_code=True)
	```

	### Representation shapes (HR-only)

	\| Output \| Shape \| Description \|
	\|--------\|-------\|-------------\|
	\| `features_hr[i]` \| multi-scale \| Backbone features at 4 scales (stage 0–3) \|
	\| `features_fusion` \| `(B, 1024, H, W)` \| Fused spatial representation for downstream head \|

	## Input Formats

	\| Modality \| Shape \| Description \|
	\|----------\|-------\|-------------\|
	\| hr_img \| `(B, 3, H, W)` \| RGB high-res, H=W=512 typical \|
	\| s2_img \| `(B, 10, S, H, W)` \| Sentinel-2, 10 bands, S time steps \|
	\| s1_img \| `(B, 2, S, H, W)` \| Sentinel-1 VV/VH, S time steps \|

	## Citation

	```bibtex
	@article{skysensepp2025,
	title={SkySense++: A Semantic-Enhanced Multi-Modal Remote Sensing Foundation Model Beyond SkySense for Earth Observation},
	journal={Nature Machine Intelligence},
	year={2025},
	url={https://www.nature.com/articles/s42256-025-01078-8}
	}
	```

	## References

	- [Project Page](https://zqcrafts.github.io/SkySense-O/project.html)
	- [Zenodo Datasets](https://zenodo.org/records/15010418)
	- [Hugging Face SkySense](https://huggingface.co/KKKKKKang/JL-16)