Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
| # HuggingFace Dataset Management | |
| Scripts for preparing and uploading datasets to HuggingFace. | |
| ## Setup & Configuration | |
| ### check-hf-vars.py | |
| Verify HuggingFace environment variables are properly configured. | |
| **Usage:** | |
| ```bash | |
| python scripts/huggingface/check-hf-vars.py | |
| ``` | |
| ### setup-huggingface.sh | |
| Initial setup for HuggingFace integration (credentials, organization). | |
| **Usage:** | |
| ```bash | |
| ./scripts/huggingface/setup-huggingface.sh | |
| ``` | |
| ## Preparation | |
| ### reorganize_for_huggingface.py | |
| Reorganizes data files into HuggingFace-compatible structure. | |
| **Usage:** | |
| ```bash | |
| python scripts/huggingface/reorganize_for_huggingface.py | |
| ``` | |
| ### finalize_huggingface_structure.py | |
| Final validation and preparation of HuggingFace datasets. | |
| **Usage:** | |
| ```bash | |
| python scripts/huggingface/finalize_huggingface_structure.py | |
| ``` | |
| ## Upload Scripts | |
| ### upload_to_huggingface.py | |
| **Main upload script** - uploads all datasets to HuggingFace. | |
| **Usage:** | |
| ```bash | |
| python scripts/huggingface/upload_to_huggingface.py | |
| ``` | |
| **Requirements:** | |
| - HuggingFace token in environment | |
| - HF_ORGANIZATION set in .env | |
| ### Specific Uploads | |
| - `upload_nonprofits_to_hf.py` - Upload nonprofit datasets | |
| - `upload_meetings_to_hf.py` - Upload meeting datasets | |
| - `upload_state_splits_to_hf.py` - Upload state-partitioned data | |
| ## Publishing & Deployment | |
| ### deploy-huggingface.sh | |
| **Main deployment script** - builds and deploys to HuggingFace Spaces. | |
| **Usage:** | |
| ```bash | |
| ./scripts/huggingface/deploy-huggingface.sh | |
| ``` | |
| ### publish_gold_datasets.py | |
| Publish processed gold datasets to HuggingFace. | |
| **Usage:** | |
| ```bash | |
| python scripts/huggingface/publish_gold_datasets.py | |
| ``` | |
| ### delete_and_publish_all_datasets.py | |
| **Dangerous!** Deletes and republishes all datasets (fresh start). | |
| **Usage:** | |
| ```bash | |
| python scripts/huggingface/delete_and_publish_all_datasets.py | |
| ``` | |
| ## Error Recovery | |
| ### retry_failed_datasets.py | |
| Retry uploading datasets that failed previously. | |
| **Usage:** | |
| ```bash | |
| python scripts/huggingface/retry_failed_datasets.py | |
| ``` | |
| ### fix_and_publish_failed.py | |
| Fix and republish specific failed datasets. | |
| **Usage:** | |
| ```bash | |
| python scripts/huggingface/fix_and_publish_failed.py | |
| ``` | |
| ## Maintenance | |
| ### hf-dataset-cleanup.sh | |
| Clean up old/orphaned HuggingFace datasets. | |
| **Usage:** | |
| ```bash | |
| ./scripts/huggingface/hf-dataset-cleanup.sh | |
| ``` | |
| ### force-hf-rebuild.sh | |
| Force complete rebuild and reupload (clears cache). | |
| **Usage:** | |
| ```bash | |
| ./scripts/huggingface/force-hf-rebuild.sh | |
| ``` | |
| ## Workflow | |
| 1. Setup: `setup-huggingface.sh` | |
| 2. Check config: `check-hf-vars.py` | |
| 3. Prepare data: `reorganize_for_huggingface.py` | |
| 4. Finalize: `finalize_huggingface_structure.py` | |
| 5. Upload: `upload_to_huggingface.py` | |
| 6. Deploy: `deploy-huggingface.sh` | |
| ## Environment Variables | |
| Required in `.env`: | |
| ```bash | |
| HF_ORGANIZATION=CommunityOne | |
| HF_USERNAME=CommunityOne | |
| HF_TOKEN=hf_... | |
| ``` | |