AutoFarm Data Source Registry
This document records the public sources used by the active data bootstrap pipeline and the local cache that supports reproducible rebuilds.
Provenance Status
exact: the upstream endpoint or asset is known and documented.page-only: the upstream source page is known, but the exact downloaded file URL is not preserved locally.repo-derived: the local file is derived from another source already present in the repository.unresolved: the exact upstream acquisition path is not recoverable from the current repository state.
Assets Used By The Current Pipeline
| Local asset | Current use | Upstream source | Provenance status | Evidence |
|---|---|---|---|---|
data_local/downloads/usda_soil/ |
zone_state_bootstrap cache |
USDA Soil Data Access POST endpoint: https://sdmdataaccess.sc.egov.usda.gov/Tabular/post.rest | exact |
Used by build_zone_state_bootstrap.py. |
| live Open-Meteo archive queries | zone_state_bootstrap |
Archive API: https://archive-api.open-meteo.com/v1/archive | exact |
Used by build_zone_state_bootstrap.py. |
| live Open-Meteo forecast queries | zone_state_bootstrap |
Forecast API: https://api.open-meteo.com/v1/forecast | exact |
Used by build_zone_state_bootstrap.py. |
| live Open-Meteo elevation queries | zone_state_bootstrap |
Elevation API: https://api.open-meteo.com/v1/elevation | exact |
Used by build_zone_state_bootstrap.py. |
| live SoilGrids fallback queries | zone_state_bootstrap fallback |
SoilGrids REST query endpoint: https://rest.isric.org/soilgrids/v2.0/properties/query | exact |
Written into the dataset card by build_zone_state_bootstrap.py. |
Rebuild Notes
When recreating the processed public-data outputs in a fresh environment:
- run
python scripts/run_public_data_pipeline.py, - verify that
data/processed/zone_state_bootstrap.parquetexists, - confirm that
data/processed/zone_state_bootstrap.dataset_card.jsonrecords the active upstream sources.