Spaces:
Sleeping
Sleeping
| # Persistence Implementation Summary | |
| ## Overview | |
| Successfully implemented **dual persistence** for the ReliefWeb Annotation Gradio Space: | |
| 1. **Manual Download Button** - Always available, no configuration needed | |
| 2. **Auto-Backup to HF Datasets** - Optional cloud sync with automatic persistence | |
| ## Changes Made | |
| ### 1. Dependencies (`requirements.txt`) | |
| Added: | |
| - `huggingface_hub>=0.20.0` - For HF authentication and API | |
| - `datasets>=2.16.0` - For HF Datasets integration | |
| ### 2. Core Application (`app.py`) | |
| #### New Imports | |
| ```python | |
| import os | |
| from huggingface_hub import HfApi, login | |
| from datasets import Dataset, load_dataset | |
| ``` | |
| #### ValidationAnnotator Class Updates | |
| **Constructor (`__init__`)** | |
| - Added `hf_dataset_repo` and `hf_token` parameters | |
| - Auto-detects HF credentials from environment variables | |
| - Attempts to login and enable HF Datasets if credentials available | |
| - Loads annotations from both HF Datasets (cloud) and local file | |
| **New Method: `_push_to_hf_datasets()`** | |
| - Converts annotations dict to Dataset format | |
| - Pushes to HF Hub with private visibility | |
| - Called automatically after each annotation | |
| **Updated Method: `_load_annotations()`** | |
| - First tries to load from HF Datasets (cloud backup) | |
| - Then loads from local file (may have newer annotations) | |
| - Merges both sources, with local taking precedence | |
| **Updated Method: `_save_annotation()`** | |
| - Saves to local file (as before) | |
| - Automatically pushes to HF Datasets if enabled | |
| - Gracefully handles HF push failures | |
| #### UI Updates | |
| **New Download Button** | |
| ```python | |
| download_btn = gr.DownloadButton( | |
| "πΎ Download Annotations", | |
| value=str(annotator.output_file) if annotator.output_file.exists() else None, | |
| size="sm", | |
| variant="secondary" | |
| ) | |
| ``` | |
| - Updates automatically after each annotation | |
| - Always available, no configuration needed | |
| **Status Indicator** | |
| - Shows "βοΈ Auto-backup enabled" with link to dataset if HF enabled | |
| - Shows "β οΈ Auto-backup disabled" with setup hint if not enabled | |
| **Button Click Handlers** | |
| - Accept and Reject buttons now chain to update download button | |
| - Download button gets fresh file path after each annotation | |
| #### Main Entry Point | |
| ```python | |
| # Get HF credentials from environment | |
| hf_dataset_repo = os.getenv("HF_DATASET_REPO") | |
| hf_token = os.getenv("HF_TOKEN") | |
| # Pass to create_app | |
| app = create_app(input_file, hf_dataset_repo, hf_token) | |
| ``` | |
| ### 3. Documentation Updates | |
| #### README.md | |
| - Added persistence features to feature list | |
| - New section: "Persistence Options" | |
| - Instructions for both manual download and auto-backup | |
| - Step-by-step HF Datasets setup guide | |
| #### DEPLOYMENT.md | |
| - New section: "Setting Up Persistence" | |
| - Detailed 4-step guide for HF Datasets configuration | |
| - Benefits of auto-backup listed | |
| - Updated "Important Notes" section | |
| ## How It Works | |
| ### Manual Download (Always Available) | |
| 1. User clicks "πΎ Download Annotations" button | |
| 2. Browser downloads the JSONL file immediately | |
| 3. File contains all annotations made so far | |
| 4. Works offline, no configuration needed | |
| ### Auto-Backup (Optional) | |
| 1. **On Startup**: | |
| - Checks for `HF_TOKEN` and `HF_DATASET_REPO` env vars | |
| - If found, logs into HF and enables auto-backup | |
| - Loads existing annotations from HF Dataset | |
| 2. **On Each Annotation**: | |
| - Saves to local file (ephemeral) | |
| - Converts all annotations to Dataset format | |
| - Pushes to HF Hub (replaces entire dataset) | |
| - Shows success/failure in console logs | |
| 3. **On Space Restart**: | |
| - Loads annotations from HF Dataset | |
| - Continues from where user left off | |
| - No data loss! | |
| ## Configuration for Users | |
| ### For Manual Download Only | |
| **No configuration needed!** Just use the app and click download when desired. | |
| ### For Auto-Backup | |
| 1. Create HF Dataset: https://huggingface.co/new-dataset | |
| 2. Get write token: https://huggingface.co/settings/tokens | |
| 3. Add Space secrets: | |
| - `HF_TOKEN`: Your write token | |
| - `HF_DATASET_REPO`: `username/dataset-name` | |
| 4. Restart Space | |
| ## Benefits | |
| ### Manual Download | |
| β No setup required | |
| β Works offline | |
| β User controls when to backup | |
| β Simple and reliable | |
| ### Auto-Backup | |
| β Survives Space restarts | |
| β Automatic version control | |
| β Resume from any device | |
| β Collaborative annotation | |
| β Easy dataset management | |
| β No manual intervention needed | |
| ## Testing | |
| ### Local Testing | |
| ```bash | |
| cd /Users/rafaelmacalaba/WBG/monitoring_of_datause/revalidation/analysis/unhcr_reliefweb/reliefweb_annotation | |
| # Without HF Datasets (download only) | |
| uv run app.py | |
| # With HF Datasets | |
| export HF_TOKEN="your_token" | |
| export HF_DATASET_REPO="username/dataset-name" | |
| uv run app.py | |
| ``` | |
| ### On HF Spaces | |
| 1. Upload files to Space | |
| 2. Optionally configure secrets | |
| 3. Space auto-deploys | |
| 4. Check console logs for HF Datasets status | |
| ## Files Modified | |
| 1. β `requirements.txt` - Added HF dependencies | |
| 2. β `app.py` - Implemented dual persistence | |
| 3. β `README.md` - Documented features and setup | |
| 4. β `DEPLOYMENT.md` - Added configuration guide | |
| ## Deployment Checklist | |
| - [x] Dependencies updated | |
| - [x] Download button implemented | |
| - [x] HF Datasets integration implemented | |
| - [x] UI indicators added | |
| - [x] Documentation updated | |
| - [x] Local testing completed | |
| - [ ] Deploy to HF Spaces | |
| - [ ] Configure HF secrets (optional) | |
| - [ ] Test in production | |
| ## Next Steps | |
| 1. **Deploy to HF Spaces** (see DEPLOYMENT.md) | |
| 2. **Configure secrets** for auto-backup (optional) | |
| 3. **Start annotating!** | |
| 4. **Download periodically** as backup | |
| 5. **Monitor HF Dataset** for automatic backups | |
| --- | |
| **Status**: β Implementation Complete | |
| **Ready for Deployment**: Yes | |
| **Breaking Changes**: None (backward compatible) | |