--- title: Remote audit emoji: 📚 colorFrom: blue colorTo: yellow sdk: docker pinned: false license: openrail short_description: Remote auditor --- # Remote Audit App - Setup Instructions This Hugging Face Space performs design-based tests of randomization integrity using pre-treatment satellite imagery, implementing the conditional randomization test from your paper. ## Quick Start ### Option 1: Use Pre-computed Satellite Data (Recommended for HF) The app expects satellite features (NDVI, EVI, VIIRS) to be pre-computed. To replicate the Begum et al. 2022 audit: 1. Download the pre-processed dataset with satellite features 2. Place `Islam2019_WithGeocodesAndSatData.Rdata` in the app directory 3. Select "Use Example (Islam 2019)" in the app ### Option 2: Upload Your Own CSV Your CSV should include: - Treatment assignment column (e.g., `begum_treat` with values 1=control, 2=treatment) - Satellite features: `ndvi_median`, `viirs_median` (or similar) - Any other columns for reference Example CSV structure: ``` id,begum_treat,ndvi_median,viirs_median 1,1,0.45,2.3 2,2,0.52,3.1 ... ``` ## Setting Up GEE for New Data The app uses **pre-computed** satellite features. To add GEE capabilities for computing features on-the-fly: ### Prerequisites 1. Google Earth Engine account (free): https://earthengine.google.com/signup/ 2. Python environment with `earthengine-api` ### Installation Steps ```bash # Install Earth Engine Python API pip install earthengine-api # Authenticate (first time only) earthengine authenticate # Initialize in your script import ee ee.Initialize() ``` ### Computing Satellite Features Use the GEE code from `RemoteAuditOfBrokenRCT.R` to compute features: ```python import ee import pandas as pd def compute_satellite_features(lat, lon, start_date, end_date): """ Compute NDVI, EVI, and VIIRS features for a location Args: lat, lon: Coordinates start_date, end_date: Date range (YYYY-MM-DD) Returns: dict with ndvi_median, viirs_median, etc. """ point = ee.Geometry.Point([lon, lat]) # MODIS vegetation indices modis = (ee.ImageCollection('MODIS/061/MOD13Q1') .filterDate(start_date, end_date) .select(['NDVI', 'EVI']) .map(lambda img: img.multiply(0.0001))) ndvi_median = modis.select('NDVI').median() evi_median = modis.select('EVI').median() # VIIRS nighttime lights viirs = (ee.ImageCollection('NOAA/VIIRS/DNB/MONTHLY_V1/VCMSLCFG') .filterDate(start_date, end_date) .select(['avg_rad'])) viirs_median = viirs.median() # Sample at location sample = (ndvi_median.addBands([evi_median, viirs_median]) .sample(point, 250) .first() .getInfo()) return { 'ndvi_median': sample['properties'].get('NDVI'), 'evi_median': sample['properties'].get('EVI'), 'viirs_median': sample['properties'].get('avg_rad') } ``` ### Integration with R (via reticulate) To integrate GEE in your R workflow: ```r library(reticulate) # Set Python environment Sys.setenv(RETICULATE_PYTHON = "/path/to/python") # Import Earth Engine ee <- import("ee") ee$Initialize() # Call Python function from R compute_features <- py_run_string(" def get_features(lat, lon): # Your GEE code here return {'ndvi_median': ..., 'viirs_median': ...} ") features <- compute_features$get_features(lat, lon) ``` ## App Configuration on Hugging Face ### Required Files - `app.R` - Main Shiny application - `Dockerfile` - Container configuration - `Islam2019_WithGeocodesAndSatData.Rdata` - Example dataset (optional) ### Environment Variables None required for basic functionality. ### Secrets (if adding GEE) If you want to enable on-the-fly GEE queries: 1. Add `GOOGLE_APPLICATION_CREDENTIALS` secret in HF Space settings 2. Upload service account JSON 3. Modify app to call GEE API ## Usage Guide ### Running a Randomization Audit 1. **Load Data**: Upload CSV or use example 2. **Configure Audit**: - Audit Type: "Randomization" - Treatment Column: Select column (e.g., `begum_treat`) - Control Value: 1 - Treatment Value: 2 3. **Select Features**: Check `ndvi_median` and `viirs_median` 4. **Choose Learner**: Logistic (fast) or XGBoost (more flexible) 5. **Set Parameters**: - K-Folds: 5-10 (higher = more robust) - Resamples: 1000-2000 (higher = more precise p-value) 6. **Run Audit**: Click "Run Audit" button ### Running a Missingness Audit Same steps but: - Audit Type: "Missingness" - Select variable to check for missing data patterns ### Interpreting Results - **p < 0.05**: Assignment is MORE predictable from satellite features than expected → potential deviation from stated randomization - **p ≥ 0.05**: No evidence of deviation detected (but doesn't prove perfect randomization) ## Technical Notes ### Computation Time - Logistic: ~1-3 minutes for 500 units, 1000 resamples - XGBoost: ~3-10 minutes (depends on tree settings) ### Memory Requirements - Small datasets (<1000 units): 2GB RAM sufficient - Large datasets (>5000 units): Consider 4GB+ RAM ### Handling Missing Satellite Data If your CSV has missing satellite features: - App will drop rows with missing values - Consider imputation before upload, or - Use GEE to compute features for missing locations ## Troubleshooting ### "Feature not found" error - Check that your CSV has columns named exactly: `ndvi_median`, `viirs_median` - Column names are case-sensitive ### "Too few complete cases" error - Ensure at least 10 units have both valid treatment assignment and satellite features - Check for NA values in your data ### GEE authentication issues ```bash # Re-authenticate earthengine authenticate # Check credentials python -c "import ee; ee.Initialize(); print('Success!')" ``` ### Dockerfile build fails ```bash # Test locally docker build -t remote-audit . docker run -p 7860:7860 remote-audit ``` ## Citation If you use this app, please cite: ``` Jerzak, C. T., & Daoud, A. (2025). Remote Auditing: Design-Based Tests of Randomization, Selection, and Missingness with Broadly Accessible Satellite Imagery. ``` ## Support For issues or questions: - Check the paper's technical appendix - Review example code in `RemoteAuditOfBrokenRCT.R` - Contact: [your contact info]