Spaces:

cjerzak
/

ra

Running

App Files Files Community

ra / README.md

cjerzak

Upload README.md

5788577 verified 4 months ago

preview code

raw

history blame contribute delete

6.39 kB

metadata

title: Remote audit
emoji: 📚
colorFrom: blue
colorTo: yellow
sdk: docker
pinned: false
license: openrail
short_description: Remote auditor

Remote Audit App - Setup Instructions

This Hugging Face Space performs design-based tests of randomization integrity using pre-treatment satellite imagery, implementing the conditional randomization test from your paper.

Quick Start

Option 1: Use Pre-computed Satellite Data (Recommended for HF)

The app expects satellite features (NDVI, EVI, VIIRS) to be pre-computed. To replicate the Begum et al. 2022 audit:

Download the pre-processed dataset with satellite features
Place Islam2019_WithGeocodesAndSatData.Rdata in the app directory
Select "Use Example (Islam 2019)" in the app

Option 2: Upload Your Own CSV

Your CSV should include:

Treatment assignment column (e.g., begum_treat with values 1=control, 2=treatment)
Satellite features: ndvi_median, viirs_median (or similar)
Any other columns for reference

Example CSV structure:

id,begum_treat,ndvi_median,viirs_median
1,1,0.45,2.3
2,2,0.52,3.1
...

Setting Up GEE for New Data

The app uses pre-computed satellite features. To add GEE capabilities for computing features on-the-fly:

Prerequisites

Google Earth Engine account (free): https://earthengine.google.com/signup/
Python environment with earthengine-api

Installation Steps

# Install Earth Engine Python API
pip install earthengine-api

# Authenticate (first time only)
earthengine authenticate

# Initialize in your script
import ee
ee.Initialize()

Computing Satellite Features

Use the GEE code from RemoteAuditOfBrokenRCT.R to compute features:

import ee
import pandas as pd

def compute_satellite_features(lat, lon, start_date, end_date):
    """
    Compute NDVI, EVI, and VIIRS features for a location
    
    Args:
        lat, lon: Coordinates
        start_date, end_date: Date range (YYYY-MM-DD)
    
    Returns:
        dict with ndvi_median, viirs_median, etc.
    """
    point = ee.Geometry.Point([lon, lat])
    
    # MODIS vegetation indices
    modis = (ee.ImageCollection('MODIS/061/MOD13Q1')
             .filterDate(start_date, end_date)
             .select(['NDVI', 'EVI'])
             .map(lambda img: img.multiply(0.0001)))
    
    ndvi_median = modis.select('NDVI').median()
    evi_median = modis.select('EVI').median()
    
    # VIIRS nighttime lights
    viirs = (ee.ImageCollection('NOAA/VIIRS/DNB/MONTHLY_V1/VCMSLCFG')
             .filterDate(start_date, end_date)
             .select(['avg_rad']))
    
    viirs_median = viirs.median()
    
    # Sample at location
    sample = (ndvi_median.addBands([evi_median, viirs_median])
              .sample(point, 250)
              .first()
              .getInfo())
    
    return {
        'ndvi_median': sample['properties'].get('NDVI'),
        'evi_median': sample['properties'].get('EVI'),
        'viirs_median': sample['properties'].get('avg_rad')
    }

Integration with R (via reticulate)

To integrate GEE in your R workflow:

library(reticulate)

# Set Python environment
Sys.setenv(RETICULATE_PYTHON = "/path/to/python")

# Import Earth Engine
ee <- import("ee")
ee$Initialize()

# Call Python function from R
compute_features <- py_run_string("
def get_features(lat, lon):
    # Your GEE code here
    return {'ndvi_median': ..., 'viirs_median': ...}
")

features <- compute_features$get_features(lat, lon)

App Configuration on Hugging Face

Required Files

app.R - Main Shiny application
Dockerfile - Container configuration
Islam2019_WithGeocodesAndSatData.Rdata - Example dataset (optional)

Environment Variables

None required for basic functionality.

Secrets (if adding GEE)

If you want to enable on-the-fly GEE queries:

Add GOOGLE_APPLICATION_CREDENTIALS secret in HF Space settings
Upload service account JSON
Modify app to call GEE API

Usage Guide

Running a Randomization Audit

Load Data: Upload CSV or use example
Configure Audit:
- Audit Type: "Randomization"
- Treatment Column: Select column (e.g., begum_treat)
- Control Value: 1
- Treatment Value: 2
Select Features: Check ndvi_median and viirs_median
Choose Learner: Logistic (fast) or XGBoost (more flexible)
Set Parameters:
- K-Folds: 5-10 (higher = more robust)
- Resamples: 1000-2000 (higher = more precise p-value)
Run Audit: Click "Run Audit" button

Running a Missingness Audit

Same steps but:

Audit Type: "Missingness"
Select variable to check for missing data patterns

Interpreting Results

p < 0.05: Assignment is MORE predictable from satellite features than expected → potential deviation from stated randomization
p ≥ 0.05: No evidence of deviation detected (but doesn't prove perfect randomization)

Technical Notes

Computation Time

Logistic: ~1-3 minutes for 500 units, 1000 resamples
XGBoost: ~3-10 minutes (depends on tree settings)

Memory Requirements

Small datasets (<1000 units): 2GB RAM sufficient
Large datasets (>5000 units): Consider 4GB+ RAM

Handling Missing Satellite Data

If your CSV has missing satellite features:

App will drop rows with missing values
Consider imputation before upload, or
Use GEE to compute features for missing locations

Troubleshooting

"Feature not found" error

Check that your CSV has columns named exactly: ndvi_median, viirs_median
Column names are case-sensitive

"Too few complete cases" error

Ensure at least 10 units have both valid treatment assignment and satellite features
Check for NA values in your data

GEE authentication issues

# Re-authenticate
earthengine authenticate

# Check credentials
python -c "import ee; ee.Initialize(); print('Success!')"

Dockerfile build fails

# Test locally
docker build -t remote-audit .
docker run -p 7860:7860 remote-audit

Citation

If you use this app, please cite:

Jerzak, C. T., & Daoud, A. (2025). Remote Auditing: Design-Based Tests 
of Randomization, Selection, and Missingness with Broadly Accessible 
Satellite Imagery.

Support

For issues or questions:

Check the paper's technical appendix
Review example code in RemoteAuditOfBrokenRCT.R
Contact: [your contact info]