Spaces:
Running on CPU Upgrade
HuggingFace Dataset Management
Scripts for preparing and uploading datasets to HuggingFace.
Setup & Configuration
check-hf-vars.py
Verify HuggingFace environment variables are properly configured.
Usage:
python scripts/huggingface/check-hf-vars.py
setup-huggingface.sh
Initial setup for HuggingFace integration (credentials, organization).
Usage:
./scripts/huggingface/setup-huggingface.sh
Preparation
reorganize_for_huggingface.py
Reorganizes data files into HuggingFace-compatible structure.
Usage:
python scripts/huggingface/reorganize_for_huggingface.py
finalize_huggingface_structure.py
Final validation and preparation of HuggingFace datasets.
Usage:
python scripts/huggingface/finalize_huggingface_structure.py
Upload Scripts
upload_to_huggingface.py
Main upload script - uploads all datasets to HuggingFace.
Usage:
python scripts/huggingface/upload_to_huggingface.py
Requirements:
- HuggingFace token in environment
- HF_ORGANIZATION set in .env
Specific Uploads
upload_nonprofits_to_hf.py- Upload nonprofit datasetsupload_meetings_to_hf.py- Upload meeting datasetsupload_state_splits_to_hf.py- Upload state-partitioned data
Publishing & Deployment
deploy-huggingface.sh
Main deployment script - builds and deploys to HuggingFace Spaces.
Usage:
./scripts/huggingface/deploy-huggingface.sh
publish_gold_datasets.py
Publish processed gold datasets to HuggingFace.
Usage:
python scripts/huggingface/publish_gold_datasets.py
delete_and_publish_all_datasets.py
Dangerous! Deletes and republishes all datasets (fresh start).
Usage:
python scripts/huggingface/delete_and_publish_all_datasets.py
Error Recovery
retry_failed_datasets.py
Retry uploading datasets that failed previously.
Usage:
python scripts/huggingface/retry_failed_datasets.py
fix_and_publish_failed.py
Fix and republish specific failed datasets.
Usage:
python scripts/huggingface/fix_and_publish_failed.py
Maintenance
hf-dataset-cleanup.sh
Clean up old/orphaned HuggingFace datasets.
Usage:
./scripts/huggingface/hf-dataset-cleanup.sh
force-hf-rebuild.sh
Force complete rebuild and reupload (clears cache).
Usage:
./scripts/huggingface/force-hf-rebuild.sh
Workflow
- Setup:
setup-huggingface.sh - Check config:
check-hf-vars.py - Prepare data:
reorganize_for_huggingface.py - Finalize:
finalize_huggingface_structure.py - Upload:
upload_to_huggingface.py - Deploy:
deploy-huggingface.sh
Environment Variables
Required in .env:
HF_ORGANIZATION=CommunityOne
HF_USERNAME=CommunityOne
HF_TOKEN=hf_...