Tabular Classification
Scikit-learn
Joblib
remote-sensing
tree-canopy
sentinel-2
philippines
metro-manila
civic-technology
Instructions to use xmpuspus/leaves-ph with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Scikit-learn
How to use xmpuspus/leaves-ph with Scikit-learn:
from huggingface_hub import hf_hub_download import joblib model = joblib.load( hf_hub_download("xmpuspus/leaves-ph", "sklearn_model.joblib") ) # only load pickle files from sources you trust # read more about it here https://skops.readthedocs.io/en/stable/persistence.html - Notebooks
- Google Colab
- Kaggle
| license: mit | |
| library_name: sklearn | |
| tags: | |
| - remote-sensing | |
| - tree-canopy | |
| - sentinel-2 | |
| - philippines | |
| - metro-manila | |
| - civic-technology | |
| pipeline_tag: tabular-classification | |
| # Leaves.PH canopy classifier | |
| The published per-pixel canopy model behind [leaves.ph](https://leaves.ph), an open, reproducible tree-cover map of Metro Manila (17 NCR cities plus Pateros), 2019 to 2026. | |
| A `HistGradientBoostingClassifier` (scikit-learn) that labels each 30 m Sentinel-2 pixel as canopy or not, trained on **656 hand-labeled high-resolution pixels**. It is the source of every figure on the site: the map, the per-LGU and per-barangay series, and the headline NCR canopy percentage (about 9 to 10 percent). | |
| ## Features (10) | |
| `ndvi`, `dw` (Dynamic World tree probability), `meta_h` (Meta v2 1 m canopy height), `esatree` (ESA WorldCover tree class), the raw Sentinel-2 bands `red` / `nir` / `green` / `blue`, plus `gndvi` and `nir_red`. Decision threshold 0.5, calibrated so 2021 matches the 10.1 percent human-truth canopy. | |
| ## Evaluation | |
| Scored against the 656 manual labels under region-grouped out-of-fold cross-validation with post-stratified population weighting: | |
| | Model | Precision | Recall | F1 | IoU | | |
| |---|---|---|---|---| | |
| | This classifier (10 features) | 0.77 | 0.79 | **0.78** | **0.64** | | |
| | Four-feature model (no spectral bands) | - | - | 0.75 | - | | |
| | NDVI > 0.62 baseline | 0.69 | 0.67 | 0.68 | 0.52 | | |
| The raw green and blue bands are what lift it over the baseline: they let it reject high-NDVI grass and scrub the NDVI threshold over-called (precision 0.67 to 0.77), and it removes the year-to-year sawtooth the fixed threshold produced. A CLIP ViT-L/14 embedding was tested as a feature and did not help, so it was dropped. | |
| ## Intended use and limits | |
| - It is a tree-canopy estimate with a known grass/scrub margin, not a per-tree census and not a parcel-level land-use tool. | |
| - "Canopy" is a dense-vegetation proxy. Roughly half the flagged area is verified tree canopy above 5 m. | |
| - The threshold is calibrated to one epoch (Meta's source imagery is mostly 2018 to 2020), so read the per-year values as annual cross-sectional snapshots, not a validated change series. | |
| ## Files | |
| - `canopy_clf.joblib` - the trained classifier | |
| - `canopy_clf_meta.json` - features, threshold, training size | |
| - `master_labels.csv` - the 656 gold labels | |
| - `model_comparison.json` - the ablation it was selected from | |
| - `RESULTS.md` - the build writeup | |
| ## Links and citation | |
| - Code and pipeline: https://github.com/xmpuspus/leaves-ph (MIT) | |
| - Data products on HuggingFace: https://huggingface.co/datasets/xmpuspus/leaves-ph (CC-BY-4.0) | |
| - Methodology: https://leaves.ph/methodology | |
| - Cite: Leaves.PH (2026, v0.7.0). https://doi.org/10.5281/zenodo.20470306 | |