metadata
title: Remote audit
emoji: 📚
colorFrom: blue
colorTo: yellow
sdk: docker
pinned: false
license: openrail
short_description: Remote auditor
Remote Audit App - Setup Instructions
This Hugging Face Space performs design-based tests of randomization integrity using pre-treatment satellite imagery, implementing the conditional randomization test from your paper.
Quick Start
Option 1: Use Pre-computed Satellite Data (Recommended for HF)
The app expects satellite features (NDVI, EVI, VIIRS) to be pre-computed. To replicate the Begum et al. 2022 audit:
- Download the pre-processed dataset with satellite features
- Place
Islam2019_WithGeocodesAndSatData.Rdatain the app directory - Select "Use Example (Islam 2019)" in the app
Option 2: Upload Your Own CSV
Your CSV should include:
- Treatment assignment column (e.g.,
begum_treatwith values 1=control, 2=treatment) - Satellite features:
ndvi_median,viirs_median(or similar) - Any other columns for reference
Example CSV structure:
id,begum_treat,ndvi_median,viirs_median
1,1,0.45,2.3
2,2,0.52,3.1
...
Setting Up GEE for New Data
The app uses pre-computed satellite features. To add GEE capabilities for computing features on-the-fly:
Prerequisites
- Google Earth Engine account (free): https://earthengine.google.com/signup/
- Python environment with
earthengine-api
Installation Steps
# Install Earth Engine Python API
pip install earthengine-api
# Authenticate (first time only)
earthengine authenticate
# Initialize in your script
import ee
ee.Initialize()
Computing Satellite Features
Use the GEE code from RemoteAuditOfBrokenRCT.R to compute features:
import ee
import pandas as pd
def compute_satellite_features(lat, lon, start_date, end_date):
"""
Compute NDVI, EVI, and VIIRS features for a location
Args:
lat, lon: Coordinates
start_date, end_date: Date range (YYYY-MM-DD)
Returns:
dict with ndvi_median, viirs_median, etc.
"""
point = ee.Geometry.Point([lon, lat])
# MODIS vegetation indices
modis = (ee.ImageCollection('MODIS/061/MOD13Q1')
.filterDate(start_date, end_date)
.select(['NDVI', 'EVI'])
.map(lambda img: img.multiply(0.0001)))
ndvi_median = modis.select('NDVI').median()
evi_median = modis.select('EVI').median()
# VIIRS nighttime lights
viirs = (ee.ImageCollection('NOAA/VIIRS/DNB/MONTHLY_V1/VCMSLCFG')
.filterDate(start_date, end_date)
.select(['avg_rad']))
viirs_median = viirs.median()
# Sample at location
sample = (ndvi_median.addBands([evi_median, viirs_median])
.sample(point, 250)
.first()
.getInfo())
return {
'ndvi_median': sample['properties'].get('NDVI'),
'evi_median': sample['properties'].get('EVI'),
'viirs_median': sample['properties'].get('avg_rad')
}
Integration with R (via reticulate)
To integrate GEE in your R workflow:
library(reticulate)
# Set Python environment
Sys.setenv(RETICULATE_PYTHON = "/path/to/python")
# Import Earth Engine
ee <- import("ee")
ee$Initialize()
# Call Python function from R
compute_features <- py_run_string("
def get_features(lat, lon):
# Your GEE code here
return {'ndvi_median': ..., 'viirs_median': ...}
")
features <- compute_features$get_features(lat, lon)
App Configuration on Hugging Face
Required Files
app.R- Main Shiny applicationDockerfile- Container configurationIslam2019_WithGeocodesAndSatData.Rdata- Example dataset (optional)
Environment Variables
None required for basic functionality.
Secrets (if adding GEE)
If you want to enable on-the-fly GEE queries:
- Add
GOOGLE_APPLICATION_CREDENTIALSsecret in HF Space settings - Upload service account JSON
- Modify app to call GEE API
Usage Guide
Running a Randomization Audit
- Load Data: Upload CSV or use example
- Configure Audit:
- Audit Type: "Randomization"
- Treatment Column: Select column (e.g.,
begum_treat) - Control Value: 1
- Treatment Value: 2
- Select Features: Check
ndvi_medianandviirs_median - Choose Learner: Logistic (fast) or XGBoost (more flexible)
- Set Parameters:
- K-Folds: 5-10 (higher = more robust)
- Resamples: 1000-2000 (higher = more precise p-value)
- Run Audit: Click "Run Audit" button
Running a Missingness Audit
Same steps but:
- Audit Type: "Missingness"
- Select variable to check for missing data patterns
Interpreting Results
- p < 0.05: Assignment is MORE predictable from satellite features than expected → potential deviation from stated randomization
- p ≥ 0.05: No evidence of deviation detected (but doesn't prove perfect randomization)
Technical Notes
Computation Time
- Logistic: ~1-3 minutes for 500 units, 1000 resamples
- XGBoost: ~3-10 minutes (depends on tree settings)
Memory Requirements
- Small datasets (<1000 units): 2GB RAM sufficient
- Large datasets (>5000 units): Consider 4GB+ RAM
Handling Missing Satellite Data
If your CSV has missing satellite features:
- App will drop rows with missing values
- Consider imputation before upload, or
- Use GEE to compute features for missing locations
Troubleshooting
"Feature not found" error
- Check that your CSV has columns named exactly:
ndvi_median,viirs_median - Column names are case-sensitive
"Too few complete cases" error
- Ensure at least 10 units have both valid treatment assignment and satellite features
- Check for NA values in your data
GEE authentication issues
# Re-authenticate
earthengine authenticate
# Check credentials
python -c "import ee; ee.Initialize(); print('Success!')"
Dockerfile build fails
# Test locally
docker build -t remote-audit .
docker run -p 7860:7860 remote-audit
Citation
If you use this app, please cite:
Jerzak, C. T., & Daoud, A. (2025). Remote Auditing: Design-Based Tests
of Randomization, Selection, and Missingness with Broadly Accessible
Satellite Imagery.
Support
For issues or questions:
- Check the paper's technical appendix
- Review example code in
RemoteAuditOfBrokenRCT.R - Contact: [your contact info]