Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
File size: 2,838 Bytes
61d29fc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 | # HuggingFace Dataset Management
Scripts for preparing and uploading datasets to HuggingFace.
## Setup & Configuration
### check-hf-vars.py
Verify HuggingFace environment variables are properly configured.
**Usage:**
```bash
python scripts/huggingface/check-hf-vars.py
```
### setup-huggingface.sh
Initial setup for HuggingFace integration (credentials, organization).
**Usage:**
```bash
./scripts/huggingface/setup-huggingface.sh
```
## Preparation
### reorganize_for_huggingface.py
Reorganizes data files into HuggingFace-compatible structure.
**Usage:**
```bash
python scripts/huggingface/reorganize_for_huggingface.py
```
### finalize_huggingface_structure.py
Final validation and preparation of HuggingFace datasets.
**Usage:**
```bash
python scripts/huggingface/finalize_huggingface_structure.py
```
## Upload Scripts
### upload_to_huggingface.py
**Main upload script** - uploads all datasets to HuggingFace.
**Usage:**
```bash
python scripts/huggingface/upload_to_huggingface.py
```
**Requirements:**
- HuggingFace token in environment
- HF_ORGANIZATION set in .env
### Specific Uploads
- `upload_nonprofits_to_hf.py` - Upload nonprofit datasets
- `upload_meetings_to_hf.py` - Upload meeting datasets
- `upload_state_splits_to_hf.py` - Upload state-partitioned data
## Publishing & Deployment
### deploy-huggingface.sh
**Main deployment script** - builds and deploys to HuggingFace Spaces.
**Usage:**
```bash
./scripts/huggingface/deploy-huggingface.sh
```
### publish_gold_datasets.py
Publish processed gold datasets to HuggingFace.
**Usage:**
```bash
python scripts/huggingface/publish_gold_datasets.py
```
### delete_and_publish_all_datasets.py
**Dangerous!** Deletes and republishes all datasets (fresh start).
**Usage:**
```bash
python scripts/huggingface/delete_and_publish_all_datasets.py
```
## Error Recovery
### retry_failed_datasets.py
Retry uploading datasets that failed previously.
**Usage:**
```bash
python scripts/huggingface/retry_failed_datasets.py
```
### fix_and_publish_failed.py
Fix and republish specific failed datasets.
**Usage:**
```bash
python scripts/huggingface/fix_and_publish_failed.py
```
## Maintenance
### hf-dataset-cleanup.sh
Clean up old/orphaned HuggingFace datasets.
**Usage:**
```bash
./scripts/huggingface/hf-dataset-cleanup.sh
```
### force-hf-rebuild.sh
Force complete rebuild and reupload (clears cache).
**Usage:**
```bash
./scripts/huggingface/force-hf-rebuild.sh
```
## Workflow
1. Setup: `setup-huggingface.sh`
2. Check config: `check-hf-vars.py`
3. Prepare data: `reorganize_for_huggingface.py`
4. Finalize: `finalize_huggingface_structure.py`
5. Upload: `upload_to_huggingface.py`
6. Deploy: `deploy-huggingface.sh`
## Environment Variables
Required in `.env`:
```bash
HF_ORGANIZATION=CommunityOne
HF_USERNAME=CommunityOne
HF_TOKEN=hf_...
```
|