Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.5.1
Persistence Implementation Summary
Overview
Successfully implemented dual persistence for the ReliefWeb Annotation Gradio Space:
- Manual Download Button - Always available, no configuration needed
- Auto-Backup to HF Datasets - Optional cloud sync with automatic persistence
Changes Made
1. Dependencies (requirements.txt)
Added:
huggingface_hub>=0.20.0- For HF authentication and APIdatasets>=2.16.0- For HF Datasets integration
2. Core Application (app.py)
New Imports
import os
from huggingface_hub import HfApi, login
from datasets import Dataset, load_dataset
ValidationAnnotator Class Updates
Constructor (__init__)
- Added
hf_dataset_repoandhf_tokenparameters - Auto-detects HF credentials from environment variables
- Attempts to login and enable HF Datasets if credentials available
- Loads annotations from both HF Datasets (cloud) and local file
New Method: _push_to_hf_datasets()
- Converts annotations dict to Dataset format
- Pushes to HF Hub with private visibility
- Called automatically after each annotation
Updated Method: _load_annotations()
- First tries to load from HF Datasets (cloud backup)
- Then loads from local file (may have newer annotations)
- Merges both sources, with local taking precedence
Updated Method: _save_annotation()
- Saves to local file (as before)
- Automatically pushes to HF Datasets if enabled
- Gracefully handles HF push failures
UI Updates
New Download Button
download_btn = gr.DownloadButton(
"πΎ Download Annotations",
value=str(annotator.output_file) if annotator.output_file.exists() else None,
size="sm",
variant="secondary"
)
- Updates automatically after each annotation
- Always available, no configuration needed
Status Indicator
- Shows "βοΈ Auto-backup enabled" with link to dataset if HF enabled
- Shows "β οΈ Auto-backup disabled" with setup hint if not enabled
Button Click Handlers
- Accept and Reject buttons now chain to update download button
- Download button gets fresh file path after each annotation
Main Entry Point
# Get HF credentials from environment
hf_dataset_repo = os.getenv("HF_DATASET_REPO")
hf_token = os.getenv("HF_TOKEN")
# Pass to create_app
app = create_app(input_file, hf_dataset_repo, hf_token)
3. Documentation Updates
README.md
- Added persistence features to feature list
- New section: "Persistence Options"
- Instructions for both manual download and auto-backup
- Step-by-step HF Datasets setup guide
DEPLOYMENT.md
- New section: "Setting Up Persistence"
- Detailed 4-step guide for HF Datasets configuration
- Benefits of auto-backup listed
- Updated "Important Notes" section
How It Works
Manual Download (Always Available)
- User clicks "πΎ Download Annotations" button
- Browser downloads the JSONL file immediately
- File contains all annotations made so far
- Works offline, no configuration needed
Auto-Backup (Optional)
On Startup:
- Checks for
HF_TOKENandHF_DATASET_REPOenv vars - If found, logs into HF and enables auto-backup
- Loads existing annotations from HF Dataset
- Checks for
On Each Annotation:
- Saves to local file (ephemeral)
- Converts all annotations to Dataset format
- Pushes to HF Hub (replaces entire dataset)
- Shows success/failure in console logs
On Space Restart:
- Loads annotations from HF Dataset
- Continues from where user left off
- No data loss!
Configuration for Users
For Manual Download Only
No configuration needed! Just use the app and click download when desired.
For Auto-Backup
- Create HF Dataset: https://huggingface.co/new-dataset
- Get write token: https://huggingface.co/settings/tokens
- Add Space secrets:
HF_TOKEN: Your write tokenHF_DATASET_REPO:username/dataset-name
- Restart Space
Benefits
Manual Download
β
No setup required
β
Works offline
β
User controls when to backup
β
Simple and reliable
Auto-Backup
β
Survives Space restarts
β
Automatic version control
β
Resume from any device
β
Collaborative annotation
β
Easy dataset management
β
No manual intervention needed
Testing
Local Testing
cd /Users/rafaelmacalaba/WBG/monitoring_of_datause/revalidation/analysis/unhcr_reliefweb/reliefweb_annotation
# Without HF Datasets (download only)
uv run app.py
# With HF Datasets
export HF_TOKEN="your_token"
export HF_DATASET_REPO="username/dataset-name"
uv run app.py
On HF Spaces
- Upload files to Space
- Optionally configure secrets
- Space auto-deploys
- Check console logs for HF Datasets status
Files Modified
- β
requirements.txt- Added HF dependencies - β
app.py- Implemented dual persistence - β
README.md- Documented features and setup - β
DEPLOYMENT.md- Added configuration guide
Deployment Checklist
- Dependencies updated
- Download button implemented
- HF Datasets integration implemented
- UI indicators added
- Documentation updated
- Local testing completed
- Deploy to HF Spaces
- Configure HF secrets (optional)
- Test in production
Next Steps
- Deploy to HF Spaces (see DEPLOYMENT.md)
- Configure secrets for auto-backup (optional)
- Start annotating!
- Download periodically as backup
- Monitor HF Dataset for automatic backups
Status: β
Implementation Complete
Ready for Deployment: Yes
Breaking Changes: None (backward compatible)