File size: 6,391 Bytes
5788577 f7e2bc2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 | ---
title: Remote audit
emoji: 📚
colorFrom: blue
colorTo: yellow
sdk: docker
pinned: false
license: openrail
short_description: Remote auditor
---
# Remote Audit App - Setup Instructions
This Hugging Face Space performs design-based tests of randomization integrity using pre-treatment satellite imagery, implementing the conditional randomization test from your paper.
## Quick Start
### Option 1: Use Pre-computed Satellite Data (Recommended for HF)
The app expects satellite features (NDVI, EVI, VIIRS) to be pre-computed. To replicate the Begum et al. 2022 audit:
1. Download the pre-processed dataset with satellite features
2. Place `Islam2019_WithGeocodesAndSatData.Rdata` in the app directory
3. Select "Use Example (Islam 2019)" in the app
### Option 2: Upload Your Own CSV
Your CSV should include:
- Treatment assignment column (e.g., `begum_treat` with values 1=control, 2=treatment)
- Satellite features: `ndvi_median`, `viirs_median` (or similar)
- Any other columns for reference
Example CSV structure:
```
id,begum_treat,ndvi_median,viirs_median
1,1,0.45,2.3
2,2,0.52,3.1
...
```
## Setting Up GEE for New Data
The app uses **pre-computed** satellite features. To add GEE capabilities for computing features on-the-fly:
### Prerequisites
1. Google Earth Engine account (free): https://earthengine.google.com/signup/
2. Python environment with `earthengine-api`
### Installation Steps
```bash
# Install Earth Engine Python API
pip install earthengine-api
# Authenticate (first time only)
earthengine authenticate
# Initialize in your script
import ee
ee.Initialize()
```
### Computing Satellite Features
Use the GEE code from `RemoteAuditOfBrokenRCT.R` to compute features:
```python
import ee
import pandas as pd
def compute_satellite_features(lat, lon, start_date, end_date):
"""
Compute NDVI, EVI, and VIIRS features for a location
Args:
lat, lon: Coordinates
start_date, end_date: Date range (YYYY-MM-DD)
Returns:
dict with ndvi_median, viirs_median, etc.
"""
point = ee.Geometry.Point([lon, lat])
# MODIS vegetation indices
modis = (ee.ImageCollection('MODIS/061/MOD13Q1')
.filterDate(start_date, end_date)
.select(['NDVI', 'EVI'])
.map(lambda img: img.multiply(0.0001)))
ndvi_median = modis.select('NDVI').median()
evi_median = modis.select('EVI').median()
# VIIRS nighttime lights
viirs = (ee.ImageCollection('NOAA/VIIRS/DNB/MONTHLY_V1/VCMSLCFG')
.filterDate(start_date, end_date)
.select(['avg_rad']))
viirs_median = viirs.median()
# Sample at location
sample = (ndvi_median.addBands([evi_median, viirs_median])
.sample(point, 250)
.first()
.getInfo())
return {
'ndvi_median': sample['properties'].get('NDVI'),
'evi_median': sample['properties'].get('EVI'),
'viirs_median': sample['properties'].get('avg_rad')
}
```
### Integration with R (via reticulate)
To integrate GEE in your R workflow:
```r
library(reticulate)
# Set Python environment
Sys.setenv(RETICULATE_PYTHON = "/path/to/python")
# Import Earth Engine
ee <- import("ee")
ee$Initialize()
# Call Python function from R
compute_features <- py_run_string("
def get_features(lat, lon):
# Your GEE code here
return {'ndvi_median': ..., 'viirs_median': ...}
")
features <- compute_features$get_features(lat, lon)
```
## App Configuration on Hugging Face
### Required Files
- `app.R` - Main Shiny application
- `Dockerfile` - Container configuration
- `Islam2019_WithGeocodesAndSatData.Rdata` - Example dataset (optional)
### Environment Variables
None required for basic functionality.
### Secrets (if adding GEE)
If you want to enable on-the-fly GEE queries:
1. Add `GOOGLE_APPLICATION_CREDENTIALS` secret in HF Space settings
2. Upload service account JSON
3. Modify app to call GEE API
## Usage Guide
### Running a Randomization Audit
1. **Load Data**: Upload CSV or use example
2. **Configure Audit**:
- Audit Type: "Randomization"
- Treatment Column: Select column (e.g., `begum_treat`)
- Control Value: 1
- Treatment Value: 2
3. **Select Features**: Check `ndvi_median` and `viirs_median`
4. **Choose Learner**: Logistic (fast) or XGBoost (more flexible)
5. **Set Parameters**:
- K-Folds: 5-10 (higher = more robust)
- Resamples: 1000-2000 (higher = more precise p-value)
6. **Run Audit**: Click "Run Audit" button
### Running a Missingness Audit
Same steps but:
- Audit Type: "Missingness"
- Select variable to check for missing data patterns
### Interpreting Results
- **p < 0.05**: Assignment is MORE predictable from satellite features than expected → potential deviation from stated randomization
- **p ≥ 0.05**: No evidence of deviation detected (but doesn't prove perfect randomization)
## Technical Notes
### Computation Time
- Logistic: ~1-3 minutes for 500 units, 1000 resamples
- XGBoost: ~3-10 minutes (depends on tree settings)
### Memory Requirements
- Small datasets (<1000 units): 2GB RAM sufficient
- Large datasets (>5000 units): Consider 4GB+ RAM
### Handling Missing Satellite Data
If your CSV has missing satellite features:
- App will drop rows with missing values
- Consider imputation before upload, or
- Use GEE to compute features for missing locations
## Troubleshooting
### "Feature not found" error
- Check that your CSV has columns named exactly: `ndvi_median`, `viirs_median`
- Column names are case-sensitive
### "Too few complete cases" error
- Ensure at least 10 units have both valid treatment assignment and satellite features
- Check for NA values in your data
### GEE authentication issues
```bash
# Re-authenticate
earthengine authenticate
# Check credentials
python -c "import ee; ee.Initialize(); print('Success!')"
```
### Dockerfile build fails
```bash
# Test locally
docker build -t remote-audit .
docker run -p 7860:7860 remote-audit
```
## Citation
If you use this app, please cite:
```
Jerzak, C. T., & Daoud, A. (2025). Remote Auditing: Design-Based Tests
of Randomization, Selection, and Missingness with Broadly Accessible
Satellite Imagery.
```
## Support
For issues or questions:
- Check the paper's technical appendix
- Review example code in `RemoteAuditOfBrokenRCT.R`
- Contact: [your contact info]
|