Spaces:

cjerzak
/

ra

Running

App Files Files Community

cjerzak commited on Oct 15, 2025

Commit

f7e2bc2

verified ·

1 Parent(s): 3fe098e

Upload 3 files

Browse files

Files changed (2) hide show

README.md +228 -12
app.R +10 -2

README.md CHANGED Viewed

@@ -1,12 +1,228 @@
----
-title: ra
-emoji: 📚
-colorFrom: blue
-colorTo: yellow
-sdk: docker
-pinned: false
-license: mit
-short_description: 'ra. '
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Remote Audit App - Setup Instructions
+This Hugging Face Space performs design-based tests of randomization integrity using pre-treatment satellite imagery, implementing the conditional randomization test from your paper.
+## Quick Start
+### Option 1: Use Pre-computed Satellite Data (Recommended for HF)
+The app expects satellite features (NDVI, EVI, VIIRS) to be pre-computed. To replicate the Begum et al. 2022 audit:
+1. Download the pre-processed dataset with satellite features
+2. Place `Islam2019_WithGeocodesAndSatData.Rdata` in the app directory
+3. Select "Use Example (Islam 2019)" in the app
+### Option 2: Upload Your Own CSV
+Your CSV should include:
+- Treatment assignment column (e.g., `begum_treat` with values 1=control, 2=treatment)
+- Satellite features: `ndvi_median`, `viirs_median` (or similar)
+- Any other columns for reference
+Example CSV structure:
+```
+id,begum_treat,ndvi_median,viirs_median
+1,1,0.45,2.3
+2,2,0.52,3.1
+...
+```
+## Setting Up GEE for New Data
+The app uses **pre-computed** satellite features. To add GEE capabilities for computing features on-the-fly:
+### Prerequisites
+1. Google Earth Engine account (free): https://earthengine.google.com/signup/
+2. Python environment with `earthengine-api`
+### Installation Steps
+```bash
+# Install Earth Engine Python API
+pip install earthengine-api
+# Authenticate (first time only)
+earthengine authenticate
+# Initialize in your script
+import ee
+ee.Initialize()
+```
+### Computing Satellite Features
+Use the GEE code from `RemoteAuditOfBrokenRCT.R` to compute features:
+```python
+import ee
+import pandas as pd
+def compute_satellite_features(lat, lon, start_date, end_date):
+    """
+    Compute NDVI, EVI, and VIIRS features for a location
+    Args:
+        lat, lon: Coordinates
+        start_date, end_date: Date range (YYYY-MM-DD)
+    Returns:
+        dict with ndvi_median, viirs_median, etc.
+    """
+    point = ee.Geometry.Point([lon, lat])
+    # MODIS vegetation indices
+    modis = (ee.ImageCollection('MODIS/061/MOD13Q1')
+             .filterDate(start_date, end_date)
+             .select(['NDVI', 'EVI'])
+             .map(lambda img: img.multiply(0.0001)))
+    ndvi_median = modis.select('NDVI').median()
+    evi_median = modis.select('EVI').median()
+    # VIIRS nighttime lights
+    viirs = (ee.ImageCollection('NOAA/VIIRS/DNB/MONTHLY_V1/VCMSLCFG')
+             .filterDate(start_date, end_date)
+             .select(['avg_rad']))
+    viirs_median = viirs.median()
+    # Sample at location
+    sample = (ndvi_median.addBands([evi_median, viirs_median])
+              .sample(point, 250)
+              .first()
+              .getInfo())
+    return {
+        'ndvi_median': sample['properties'].get('NDVI'),
+        'evi_median': sample['properties'].get('EVI'),
+        'viirs_median': sample['properties'].get('avg_rad')
+    }
+```
+### Integration with R (via reticulate)
+To integrate GEE in your R workflow:
+```r
+library(reticulate)
+# Set Python environment
+Sys.setenv(RETICULATE_PYTHON = "/path/to/python")
+# Import Earth Engine
+ee <- import("ee")
+ee$Initialize()
+# Call Python function from R
+compute_features <- py_run_string("
+def get_features(lat, lon):
+    # Your GEE code here
+    return {'ndvi_median': ..., 'viirs_median': ...}
+")
+features <- compute_features$get_features(lat, lon)
+```
+## App Configuration on Hugging Face
+### Required Files
+- `app.R` - Main Shiny application
+- `Dockerfile` - Container configuration
+- `Islam2019_WithGeocodesAndSatData.Rdata` - Example dataset (optional)
+### Environment Variables
+None required for basic functionality.
+### Secrets (if adding GEE)
+If you want to enable on-the-fly GEE queries:
+1. Add `GOOGLE_APPLICATION_CREDENTIALS` secret in HF Space settings
+2. Upload service account JSON
+3. Modify app to call GEE API
+## Usage Guide
+### Running a Randomization Audit
+1. **Load Data**: Upload CSV or use example
+2. **Configure Audit**:
+   - Audit Type: "Randomization"
+   - Treatment Column: Select column (e.g., `begum_treat`)
+   - Control Value: 1
+   - Treatment Value: 2
+3. **Select Features**: Check `ndvi_median` and `viirs_median`
+4. **Choose Learner**: Logistic (fast) or XGBoost (more flexible)
+5. **Set Parameters**:
+   - K-Folds: 5-10 (higher = more robust)
+   - Resamples: 1000-2000 (higher = more precise p-value)
+6. **Run Audit**: Click "Run Audit" button
+### Running a Missingness Audit
+Same steps but:
+- Audit Type: "Missingness"
+- Select variable to check for missing data patterns
+### Interpreting Results
+- **p < 0.05**: Assignment is MORE predictable from satellite features than expected → potential deviation from stated randomization
+- **p ≥ 0.05**: No evidence of deviation detected (but doesn't prove perfect randomization)
+## Technical Notes
+### Computation Time
+- Logistic: ~1-3 minutes for 500 units, 1000 resamples
+- XGBoost: ~3-10 minutes (depends on tree settings)
+### Memory Requirements
+- Small datasets (<1000 units): 2GB RAM sufficient
+- Large datasets (>5000 units): Consider 4GB+ RAM
+### Handling Missing Satellite Data
+If your CSV has missing satellite features:
+- App will drop rows with missing values
+- Consider imputation before upload, or
+- Use GEE to compute features for missing locations
+## Troubleshooting
+### "Feature not found" error
+- Check that your CSV has columns named exactly: `ndvi_median`, `viirs_median`
+- Column names are case-sensitive
+### "Too few complete cases" error
+- Ensure at least 10 units have both valid treatment assignment and satellite features
+- Check for NA values in your data
+### GEE authentication issues
+```bash
+# Re-authenticate
+earthengine authenticate
+# Check credentials
+python -c "import ee; ee.Initialize(); print('Success!')"
+```
+### Dockerfile build fails
+```bash
+# Test locally
+docker build -t remote-audit .
+docker run -p 7860:7860 remote-audit
+```
+## Citation
+If you use this app, please cite:
+```
+Jerzak, C. T., & Daoud, A. (2025). Remote Auditing: Design-Based Tests
+of Randomization, Selection, and Missingness with Broadly Accessible
+Satellite Imagery.
+```
+## Support
+For issues or questions:
+- Check the paper's technical appendix
+- Review example code in `RemoteAuditOfBrokenRCT.R`
+- Contact: [your contact info]

app.R CHANGED Viewed

@@ -1,3 +1,4 @@
 # app.R — Remote Audit: Design-Based Tests of Randomization with Satellite Imagery
 # ==============================================================================
 # Performs conditional randomization tests to audit experimental integrity
@@ -289,6 +290,15 @@ server <- function(input, output, session) {
                       selected = grep("lon|long", cols, value = TRUE, ignore.case = TRUE)[1] %||% NULL)
   })
   output$data_preview <- renderDT({
     df <- data_loaded()
     req(df)
@@ -319,8 +329,6 @@ server <- function(input, output, session) {
       gee_email <- Sys.getenv("GEE_EMAIL", unset = NULL)
       gee_key <- Sys.getenv("GEE_KEY", unset = NULL)
-      if (!py_module_available("ee")) py_install("earthengine-api")
       py_run_string("
 import ee, pandas as pd, json

+# setwd("~/Dropbox/ImageDeconfoundAid/BrokenExperiment/ShinyApp/"); Sys.setenv(RETICULATE_PYTHON = "/Users/cjerzak/miniconda3/bin/python")
 # app.R — Remote Audit: Design-Based Tests of Randomization with Satellite Imagery
 # ==============================================================================
 # Performs conditional randomization tests to audit experimental integrity
                       selected = grep("lon|long", cols, value = TRUE, ignore.case = TRUE)[1] %||% NULL)
   })
+  observeEvent(input$data_source, {
+    if (input$data_source == "upload" && is.null(input$file_csv)) {
+      updateSelectInput(session, "treat_col", choices = character(0))
+      updateSelectInput(session, "missing_col", choices = character(0))
+      updateSelectInput(session, "lat_col", choices = character(0))
+      updateSelectInput(session, "long_col", choices = character(0))
+    }
+  })
   output$data_preview <- renderDT({
     df <- data_loaded()
     req(df)
       gee_email <- Sys.getenv("GEE_EMAIL", unset = NULL)
       gee_key <- Sys.getenv("GEE_KEY", unset = NULL)
       py_run_string("
 import ee, pandas as pd, json