Spaces:

cjerzak
/

ra

Running

App Files Files Community

ra / README.md

cjerzak

Upload README.md

5788577 verified 4 months ago

preview code

raw

history blame contribute delete

6.39 kB

	---
	title: Remote audit
	emoji: 📚
	colorFrom: blue
	colorTo: yellow
	sdk: docker
	pinned: false
	license: openrail
	short_description: Remote auditor
	---


	# Remote Audit App - Setup Instructions

	This Hugging Face Space performs design-based tests of randomization integrity using pre-treatment satellite imagery, implementing the conditional randomization test from your paper.

	## Quick Start

	### Option 1: Use Pre-computed Satellite Data (Recommended for HF)

	The app expects satellite features (NDVI, EVI, VIIRS) to be pre-computed. To replicate the Begum et al. 2022 audit:

	1. Download the pre-processed dataset with satellite features
	2. Place `Islam2019_WithGeocodesAndSatData.Rdata` in the app directory
	3. Select "Use Example (Islam 2019)" in the app

	### Option 2: Upload Your Own CSV

	Your CSV should include:
	- Treatment assignment column (e.g., `begum_treat` with values 1=control, 2=treatment)
	- Satellite features: `ndvi_median`, `viirs_median` (or similar)
	- Any other columns for reference

	Example CSV structure:
	```
	id,begum_treat,ndvi_median,viirs_median
	1,1,0.45,2.3
	2,2,0.52,3.1
	...
	```

	## Setting Up GEE for New Data

	The app uses pre-computed satellite features. To add GEE capabilities for computing features on-the-fly:

	### Prerequisites
	1. Google Earth Engine account (free): https://earthengine.google.com/signup/
	2. Python environment with `earthengine-api`

	### Installation Steps

	```bash
	# Install Earth Engine Python API
	pip install earthengine-api

	# Authenticate (first time only)
	earthengine authenticate

	# Initialize in your script
	import ee
	ee.Initialize()
	```

	### Computing Satellite Features

	Use the GEE code from `RemoteAuditOfBrokenRCT.R` to compute features:

	```python
	import ee
	import pandas as pd

	def compute_satellite_features(lat, lon, start_date, end_date):
	"""
	Compute NDVI, EVI, and VIIRS features for a location

	Args:
	lat, lon: Coordinates
	start_date, end_date: Date range (YYYY-MM-DD)

	Returns:
	dict with ndvi_median, viirs_median, etc.
	"""
	point = ee.Geometry.Point([lon, lat])

	# MODIS vegetation indices
	modis = (ee.ImageCollection('MODIS/061/MOD13Q1')
	.filterDate(start_date, end_date)
	.select(['NDVI', 'EVI'])
	.map(lambda img: img.multiply(0.0001)))

	ndvi_median = modis.select('NDVI').median()
	evi_median = modis.select('EVI').median()

	# VIIRS nighttime lights
	viirs = (ee.ImageCollection('NOAA/VIIRS/DNB/MONTHLY_V1/VCMSLCFG')
	.filterDate(start_date, end_date)
	.select(['avg_rad']))

	viirs_median = viirs.median()

	# Sample at location
	sample = (ndvi_median.addBands([evi_median, viirs_median])
	.sample(point, 250)
	.first()
	.getInfo())

	return {
	'ndvi_median': sample['properties'].get('NDVI'),
	'evi_median': sample['properties'].get('EVI'),
	'viirs_median': sample['properties'].get('avg_rad')
	}
	```

	### Integration with R (via reticulate)

	To integrate GEE in your R workflow:

	```r
	library(reticulate)

	# Set Python environment
	Sys.setenv(RETICULATE_PYTHON = "/path/to/python")

	# Import Earth Engine
	ee <- import("ee")
	ee$Initialize()

	# Call Python function from R
	compute_features <- py_run_string("
	def get_features(lat, lon):
	# Your GEE code here
	return {'ndvi_median': ..., 'viirs_median': ...}
	")

	features <- compute_features$get_features(lat, lon)
	```

	## App Configuration on Hugging Face

	### Required Files
	- `app.R` - Main Shiny application
	- `Dockerfile` - Container configuration
	- `Islam2019_WithGeocodesAndSatData.Rdata` - Example dataset (optional)

	### Environment Variables
	None required for basic functionality.

	### Secrets (if adding GEE)
	If you want to enable on-the-fly GEE queries:
	1. Add `GOOGLE_APPLICATION_CREDENTIALS` secret in HF Space settings
	2. Upload service account JSON
	3. Modify app to call GEE API

	## Usage Guide

	### Running a Randomization Audit

	1. Load Data: Upload CSV or use example
	2. Configure Audit:
	- Audit Type: "Randomization"
	- Treatment Column: Select column (e.g., `begum_treat`)
	- Control Value: 1
	- Treatment Value: 2
	3. Select Features: Check `ndvi_median` and `viirs_median`
	4. Choose Learner: Logistic (fast) or XGBoost (more flexible)
	5. Set Parameters:
	- K-Folds: 5-10 (higher = more robust)
	- Resamples: 1000-2000 (higher = more precise p-value)
	6. Run Audit: Click "Run Audit" button

	### Running a Missingness Audit

	Same steps but:
	- Audit Type: "Missingness"
	- Select variable to check for missing data patterns

	### Interpreting Results

	- p < 0.05: Assignment is MORE predictable from satellite features than expected → potential deviation from stated randomization
	- p ≥ 0.05: No evidence of deviation detected (but doesn't prove perfect randomization)

	## Technical Notes

	### Computation Time
	- Logistic: ~1-3 minutes for 500 units, 1000 resamples
	- XGBoost: ~3-10 minutes (depends on tree settings)

	### Memory Requirements
	- Small datasets (<1000 units): 2GB RAM sufficient
	- Large datasets (>5000 units): Consider 4GB+ RAM

	### Handling Missing Satellite Data

	If your CSV has missing satellite features:
	- App will drop rows with missing values
	- Consider imputation before upload, or
	- Use GEE to compute features for missing locations

	## Troubleshooting

	### "Feature not found" error
	- Check that your CSV has columns named exactly: `ndvi_median`, `viirs_median`
	- Column names are case-sensitive

	### "Too few complete cases" error
	- Ensure at least 10 units have both valid treatment assignment and satellite features
	- Check for NA values in your data

	### GEE authentication issues
	```bash
	# Re-authenticate
	earthengine authenticate

	# Check credentials
	python -c "import ee; ee.Initialize(); print('Success!')"
	```

	### Dockerfile build fails
	```bash
	# Test locally
	docker build -t remote-audit .
	docker run -p 7860:7860 remote-audit
	```

	## Citation

	If you use this app, please cite:

	```
	Jerzak, C. T., & Daoud, A. (2025). Remote Auditing: Design-Based Tests
	of Randomization, Selection, and Missingness with Broadly Accessible
	Satellite Imagery.
	```

	## Support

	For issues or questions:
	- Check the paper's technical appendix
	- Review example code in `RemoteAuditOfBrokenRCT.R`
	- Contact: [your contact info]