Update README.md

cdc4997 verified about 19 hours ago

5.8 kB

	---
	license: cc-by-4.0
	tags:
	- image-segmentation
	- image-classification
	- solar-panels
	- photovoltaic
	- remote-sensing
	- aerial-imagery
	- pytorch
	datasets:
	- gabrielkasmi/bdappv
	---

	# BDAPPV Models

	Baseline models for the [BDAPPV dataset](https://huggingface.co/datasets/gabrielkasmi/bdappv) — aerial images of rooftop photovoltaic installations in France and Belgium.

	Paper: [Kasmi et al., Scientific Data, 2023](https://doi.org/10.1038/s41597-023-01951-4) — [arXiv:2209.03726](https://arxiv.org/abs/2209.03726)

	---

	## Models

	Two architectures, trained independently on each imagery provider:

	\| File \| Task \| Architecture \| Provider \|
	\|------\|------\|-------------\|----------\|
	\| `deeplab_google_best.pth` \| Segmentation \| DeepLabV3-ResNet101 \| Google \|
	\| `deeplab_ign_best.pth` \| Segmentation \| DeepLabV3-ResNet101 \| IGN \|
	\| `inception_google_best.pth` \| Classification \| InceptionV3 \| Google \|
	\| `inception_ign_best.pth` \| Classification \| InceptionV3 \| IGN \|

	Note on training data and licensing. Checkpoints fine-tuned on the
	Google subset of BDAPPV (`_google_`) derive from imagery distributed
	under CC-BY-NC 4.0; commercial users should prefer the IGN-trained
	checkpoints (`_ign_`, CC-BY 4.0 imagery) or assess accordingly. All
	models are initialized from Mayer et al. (2022) checkpoints — see their
	repository for base model licensing.

	---

	## Benchmark protocol

	Three evaluation tracks are defined:

	Track 1 — Segmentation (single provider)
	Train and evaluate on the same provider. Report IoU and F1 on the test split.

	Track 2 — Classification (single provider)
	Train and evaluate on the same provider. Report accuracy and F1 on the test split.

	Track 3 — Distribution shift (cross-provider)
	Train on Google, evaluate on IGN test split. This is the primary robustness benchmark. Report IoU.

	Rules:
	- The test split must not be used for model selection or hyperparameter tuning — validation split only.
	- The spatial holdout by department must not be modified. Re-splitting invalidates comparability with published results.
	- For Track 3, only the Google training split may be used for training.

	---
	## Results

	Models evaluated on the official test split (spatial holdout by French department — see dataset card for details).

	### Segmentation (DeepLabV3-ResNet101)

	\| Train \| Test \| IoU \| F1 \| n (test) \|
	\|-------\|------\|-----\|----\|----------\|
	\| Google \| Google \| 0.884 \| 0.937 \| 1,935 \|
	\| IGN \| IGN \| 0.735 \| 0.844 \| 1,239 \|
	\| Google \| IGN \| 0.561 \| 0.709 \| 1,239 \|
	\| IGN \| Google \| 0.657 \| 0.786 \| 1,935 \|

	### Classification (InceptionV3)

	\| Train \| Test \| Accuracy \| Precision \| Recall \| F1 \| n (test) \|
	\|-------\|------\|----------\|-----------\|--------\|----\|----------\|
	\| Google \| Google \| 0.952 \| 0.990 \| 0.912 \| 0.949 \| 3,884 \|
	\| IGN \| IGN \| 0.640 \| 0.831 \| 0.309 \| 0.451 \| 2,593 \|
	\| Google \| IGN \| 0.592 \| 0.815 \| 0.188 \| 0.306 \| 2,593 \|
	\| IGN \| Google \| 0.543 \| 1.000 \| 0.083 \| 0.153 \| 3,884 \|

	Note on classification cross-provider results: the IGN-trained model collapses on Google imagery (Recall=0.08, Precision=1.0), indicating the model rarely predicts positives — a degenerate operating point. This illustrates the severity of the distribution shift documented in [Kasmi et al. (2025)](https://doi.org/10.1017/eds.2025.13).

	---

	## Usage

	A `model.py` helper is included in this repo to simplify loading:

	```python
	from huggingface_hub import hf_hub_download
	import importlib.util

	path = hf_hub_download("gabrielkasmi/bdappv-models", "model.py")
	spec = importlib.util.spec_from_file_location("bdappv_model", path)
	mod = importlib.util.module_from_spec(spec)
	spec.loader.exec_module(mod)

	seg = mod.load_segmentation_model("google") # or "ign"
	clf = mod.load_classification_model("google") # or "ign"
	```

	Both functions return the model in `eval()` mode. An optional `device` argument is supported (`"cpu"`, `"cuda"`, `"mps"`).

	---

	## Training

	Models trained on the official BDAPPV splits using:

	- Optimizer: AdamW (lr=1e-4, weight_decay=1e-4)
	- Scheduler: Cosine annealing
	- Effective batch size: 32 (batch 16 × grad accum 2)
	- Early stopping: patience=7 epochs on validation metric
	- Input size: 400×400 px
	- Initialization: checkpoints from [Mayer et al. (2022)](https://doi.org/10.1016/j.apenergy.2021.118469), who fine-tuned DeepLabV3-ResNet101 and InceptionV3 on 10 cm/px orthoimagery from North Rhine-Westphalia (Germany) for rooftop PV detection. These checkpoints were then further fine-tuned on BDAPPV using the splits above.

	Training scripts available in the [BDAPPV dataset repository](https://huggingface.co/datasets/gabrielkasmi/bdappv).

	---

	## Citation

	If you use these models, please cite:

	```bibtex
	@article{kasmi2022towards,
	title={Towards unsupervised assessment with open-source data of the accuracy of deep learning-based distributed PV mapping},
	author={Kasmi, Gabriel and Dubus, Laurent and Blanc, Philippe and Saint-Drenan, Yves-Marie},
	journal={arXiv preprint arXiv:2207.07466},
	year={2022}
	}
	```

	## References

	- Mayer et al. (2022). [3D-PV-Locator: Large-scale detection of rooftop-mounted photovoltaic systems in 3D.](https://doi.org/10.1016/j.apenergy.2021.118469) Applied Energy, 310, 118469. (source of the base checkpoints)
	- Kasmi et al. (2023). [A crowdsourced dataset of aerial images with annotated solar photovoltaic arrays and installation metadata.](https://doi.org/10.1038/s41597-023-01951-4) Scientific Data, 10, 59. (BDAPPV dataset)
	- Kasmi et al. (2025). [Space-scale exploration of the poor reliability of deep learning models: the case of the remote sensing of rooftop photovoltaic systems.](https://doi.org/10.1017/eds.2025.13) Environmental Data Science. (cross-provider distribution shift)