# AutoFarm Data Source Registry This document records the public sources used by the active data bootstrap pipeline and the local cache that supports reproducible rebuilds. ## Provenance Status - `exact`: the upstream endpoint or asset is known and documented. - `page-only`: the upstream source page is known, but the exact downloaded file URL is not preserved locally. - `repo-derived`: the local file is derived from another source already present in the repository. - `unresolved`: the exact upstream acquisition path is not recoverable from the current repository state. ## Assets Used By The Current Pipeline | Local asset | Current use | Upstream source | Provenance status | Evidence | |---|---|---|---|---| | `data_local/downloads/usda_soil/` | `zone_state_bootstrap` cache | USDA Soil Data Access POST endpoint: | `exact` | Used by [`build_zone_state_bootstrap.py`](../scripts/build_zone_state_bootstrap.py). | | live Open-Meteo archive queries | `zone_state_bootstrap` | Archive API: | `exact` | Used by [`build_zone_state_bootstrap.py`](../scripts/build_zone_state_bootstrap.py). | | live Open-Meteo forecast queries | `zone_state_bootstrap` | Forecast API: | `exact` | Used by [`build_zone_state_bootstrap.py`](../scripts/build_zone_state_bootstrap.py). | | live Open-Meteo elevation queries | `zone_state_bootstrap` | Elevation API: | `exact` | Used by [`build_zone_state_bootstrap.py`](../scripts/build_zone_state_bootstrap.py). | | live SoilGrids fallback queries | `zone_state_bootstrap` fallback | SoilGrids REST query endpoint: | `exact` | Written into the dataset card by [`build_zone_state_bootstrap.py`](../scripts/build_zone_state_bootstrap.py). | ## Rebuild Notes When recreating the processed public-data outputs in a fresh environment: 1. run `python scripts/run_public_data_pipeline.py`, 2. verify that `data/processed/zone_state_bootstrap.parquet` exists, 3. confirm that `data/processed/zone_state_bootstrap.dataset_card.json` records the active upstream sources.