ra / README.md
cjerzak's picture
Upload README.md
5788577 verified
metadata
title: Remote audit
emoji: 📚
colorFrom: blue
colorTo: yellow
sdk: docker
pinned: false
license: openrail
short_description: Remote auditor

Remote Audit App - Setup Instructions

This Hugging Face Space performs design-based tests of randomization integrity using pre-treatment satellite imagery, implementing the conditional randomization test from your paper.

Quick Start

Option 1: Use Pre-computed Satellite Data (Recommended for HF)

The app expects satellite features (NDVI, EVI, VIIRS) to be pre-computed. To replicate the Begum et al. 2022 audit:

  1. Download the pre-processed dataset with satellite features
  2. Place Islam2019_WithGeocodesAndSatData.Rdata in the app directory
  3. Select "Use Example (Islam 2019)" in the app

Option 2: Upload Your Own CSV

Your CSV should include:

  • Treatment assignment column (e.g., begum_treat with values 1=control, 2=treatment)
  • Satellite features: ndvi_median, viirs_median (or similar)
  • Any other columns for reference

Example CSV structure:

id,begum_treat,ndvi_median,viirs_median
1,1,0.45,2.3
2,2,0.52,3.1
...

Setting Up GEE for New Data

The app uses pre-computed satellite features. To add GEE capabilities for computing features on-the-fly:

Prerequisites

  1. Google Earth Engine account (free): https://earthengine.google.com/signup/
  2. Python environment with earthengine-api

Installation Steps

# Install Earth Engine Python API
pip install earthengine-api

# Authenticate (first time only)
earthengine authenticate

# Initialize in your script
import ee
ee.Initialize()

Computing Satellite Features

Use the GEE code from RemoteAuditOfBrokenRCT.R to compute features:

import ee
import pandas as pd

def compute_satellite_features(lat, lon, start_date, end_date):
    """
    Compute NDVI, EVI, and VIIRS features for a location
    
    Args:
        lat, lon: Coordinates
        start_date, end_date: Date range (YYYY-MM-DD)
    
    Returns:
        dict with ndvi_median, viirs_median, etc.
    """
    point = ee.Geometry.Point([lon, lat])
    
    # MODIS vegetation indices
    modis = (ee.ImageCollection('MODIS/061/MOD13Q1')
             .filterDate(start_date, end_date)
             .select(['NDVI', 'EVI'])
             .map(lambda img: img.multiply(0.0001)))
    
    ndvi_median = modis.select('NDVI').median()
    evi_median = modis.select('EVI').median()
    
    # VIIRS nighttime lights
    viirs = (ee.ImageCollection('NOAA/VIIRS/DNB/MONTHLY_V1/VCMSLCFG')
             .filterDate(start_date, end_date)
             .select(['avg_rad']))
    
    viirs_median = viirs.median()
    
    # Sample at location
    sample = (ndvi_median.addBands([evi_median, viirs_median])
              .sample(point, 250)
              .first()
              .getInfo())
    
    return {
        'ndvi_median': sample['properties'].get('NDVI'),
        'evi_median': sample['properties'].get('EVI'),
        'viirs_median': sample['properties'].get('avg_rad')
    }

Integration with R (via reticulate)

To integrate GEE in your R workflow:

library(reticulate)

# Set Python environment
Sys.setenv(RETICULATE_PYTHON = "/path/to/python")

# Import Earth Engine
ee <- import("ee")
ee$Initialize()

# Call Python function from R
compute_features <- py_run_string("
def get_features(lat, lon):
    # Your GEE code here
    return {'ndvi_median': ..., 'viirs_median': ...}
")

features <- compute_features$get_features(lat, lon)

App Configuration on Hugging Face

Required Files

  • app.R - Main Shiny application
  • Dockerfile - Container configuration
  • Islam2019_WithGeocodesAndSatData.Rdata - Example dataset (optional)

Environment Variables

None required for basic functionality.

Secrets (if adding GEE)

If you want to enable on-the-fly GEE queries:

  1. Add GOOGLE_APPLICATION_CREDENTIALS secret in HF Space settings
  2. Upload service account JSON
  3. Modify app to call GEE API

Usage Guide

Running a Randomization Audit

  1. Load Data: Upload CSV or use example
  2. Configure Audit:
    • Audit Type: "Randomization"
    • Treatment Column: Select column (e.g., begum_treat)
    • Control Value: 1
    • Treatment Value: 2
  3. Select Features: Check ndvi_median and viirs_median
  4. Choose Learner: Logistic (fast) or XGBoost (more flexible)
  5. Set Parameters:
    • K-Folds: 5-10 (higher = more robust)
    • Resamples: 1000-2000 (higher = more precise p-value)
  6. Run Audit: Click "Run Audit" button

Running a Missingness Audit

Same steps but:

  • Audit Type: "Missingness"
  • Select variable to check for missing data patterns

Interpreting Results

  • p < 0.05: Assignment is MORE predictable from satellite features than expected → potential deviation from stated randomization
  • p ≥ 0.05: No evidence of deviation detected (but doesn't prove perfect randomization)

Technical Notes

Computation Time

  • Logistic: ~1-3 minutes for 500 units, 1000 resamples
  • XGBoost: ~3-10 minutes (depends on tree settings)

Memory Requirements

  • Small datasets (<1000 units): 2GB RAM sufficient
  • Large datasets (>5000 units): Consider 4GB+ RAM

Handling Missing Satellite Data

If your CSV has missing satellite features:

  • App will drop rows with missing values
  • Consider imputation before upload, or
  • Use GEE to compute features for missing locations

Troubleshooting

"Feature not found" error

  • Check that your CSV has columns named exactly: ndvi_median, viirs_median
  • Column names are case-sensitive

"Too few complete cases" error

  • Ensure at least 10 units have both valid treatment assignment and satellite features
  • Check for NA values in your data

GEE authentication issues

# Re-authenticate
earthengine authenticate

# Check credentials
python -c "import ee; ee.Initialize(); print('Success!')"

Dockerfile build fails

# Test locally
docker build -t remote-audit .
docker run -p 7860:7860 remote-audit

Citation

If you use this app, please cite:

Jerzak, C. T., & Daoud, A. (2025). Remote Auditing: Design-Based Tests 
of Randomization, Selection, and Missingness with Broadly Accessible 
Satellite Imagery.

Support

For issues or questions:

  • Check the paper's technical appendix
  • Review example code in RemoteAuditOfBrokenRCT.R
  • Contact: [your contact info]