jcbowyer's picture
Clean HuggingFace deployment without binary files
61d29fc

HuggingFace Dataset Management

Scripts for preparing and uploading datasets to HuggingFace.

Setup & Configuration

check-hf-vars.py

Verify HuggingFace environment variables are properly configured.

Usage:

python scripts/huggingface/check-hf-vars.py

setup-huggingface.sh

Initial setup for HuggingFace integration (credentials, organization).

Usage:

./scripts/huggingface/setup-huggingface.sh

Preparation

reorganize_for_huggingface.py

Reorganizes data files into HuggingFace-compatible structure.

Usage:

python scripts/huggingface/reorganize_for_huggingface.py

finalize_huggingface_structure.py

Final validation and preparation of HuggingFace datasets.

Usage:

python scripts/huggingface/finalize_huggingface_structure.py

Upload Scripts

upload_to_huggingface.py

Main upload script - uploads all datasets to HuggingFace.

Usage:

python scripts/huggingface/upload_to_huggingface.py

Requirements:

  • HuggingFace token in environment
  • HF_ORGANIZATION set in .env

Specific Uploads

  • upload_nonprofits_to_hf.py - Upload nonprofit datasets
  • upload_meetings_to_hf.py - Upload meeting datasets
  • upload_state_splits_to_hf.py - Upload state-partitioned data

Publishing & Deployment

deploy-huggingface.sh

Main deployment script - builds and deploys to HuggingFace Spaces.

Usage:

./scripts/huggingface/deploy-huggingface.sh

publish_gold_datasets.py

Publish processed gold datasets to HuggingFace.

Usage:

python scripts/huggingface/publish_gold_datasets.py

delete_and_publish_all_datasets.py

Dangerous! Deletes and republishes all datasets (fresh start).

Usage:

python scripts/huggingface/delete_and_publish_all_datasets.py

Error Recovery

retry_failed_datasets.py

Retry uploading datasets that failed previously.

Usage:

python scripts/huggingface/retry_failed_datasets.py

fix_and_publish_failed.py

Fix and republish specific failed datasets.

Usage:

python scripts/huggingface/fix_and_publish_failed.py

Maintenance

hf-dataset-cleanup.sh

Clean up old/orphaned HuggingFace datasets.

Usage:

./scripts/huggingface/hf-dataset-cleanup.sh

force-hf-rebuild.sh

Force complete rebuild and reupload (clears cache).

Usage:

./scripts/huggingface/force-hf-rebuild.sh

Workflow

  1. Setup: setup-huggingface.sh
  2. Check config: check-hf-vars.py
  3. Prepare data: reorganize_for_huggingface.py
  4. Finalize: finalize_huggingface_structure.py
  5. Upload: upload_to_huggingface.py
  6. Deploy: deploy-huggingface.sh

Environment Variables

Required in .env:

HF_ORGANIZATION=CommunityOne
HF_USERNAME=CommunityOne
HF_TOKEN=hf_...