File size: 2,838 Bytes
61d29fc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
# HuggingFace Dataset Management

Scripts for preparing and uploading datasets to HuggingFace.

## Setup & Configuration

### check-hf-vars.py
Verify HuggingFace environment variables are properly configured.

**Usage:**
```bash
python scripts/huggingface/check-hf-vars.py
```

### setup-huggingface.sh
Initial setup for HuggingFace integration (credentials, organization).

**Usage:**
```bash
./scripts/huggingface/setup-huggingface.sh
```

## Preparation

### reorganize_for_huggingface.py
Reorganizes data files into HuggingFace-compatible structure.

**Usage:**
```bash
python scripts/huggingface/reorganize_for_huggingface.py
```

### finalize_huggingface_structure.py
Final validation and preparation of HuggingFace datasets.

**Usage:**
```bash
python scripts/huggingface/finalize_huggingface_structure.py
```

## Upload Scripts

### upload_to_huggingface.py
**Main upload script** - uploads all datasets to HuggingFace.

**Usage:**
```bash
python scripts/huggingface/upload_to_huggingface.py
```

**Requirements:**
- HuggingFace token in environment
- HF_ORGANIZATION set in .env

### Specific Uploads

- `upload_nonprofits_to_hf.py` - Upload nonprofit datasets
- `upload_meetings_to_hf.py` - Upload meeting datasets
- `upload_state_splits_to_hf.py` - Upload state-partitioned data

## Publishing & Deployment

### deploy-huggingface.sh
**Main deployment script** - builds and deploys to HuggingFace Spaces.

**Usage:**
```bash
./scripts/huggingface/deploy-huggingface.sh
```

### publish_gold_datasets.py
Publish processed gold datasets to HuggingFace.

**Usage:**
```bash
python scripts/huggingface/publish_gold_datasets.py
```

### delete_and_publish_all_datasets.py
**Dangerous!** Deletes and republishes all datasets (fresh start).

**Usage:**
```bash
python scripts/huggingface/delete_and_publish_all_datasets.py
```

## Error Recovery

### retry_failed_datasets.py
Retry uploading datasets that failed previously.

**Usage:**
```bash
python scripts/huggingface/retry_failed_datasets.py
```

### fix_and_publish_failed.py
Fix and republish specific failed datasets.

**Usage:**
```bash
python scripts/huggingface/fix_and_publish_failed.py
```

## Maintenance

### hf-dataset-cleanup.sh
Clean up old/orphaned HuggingFace datasets.

**Usage:**
```bash
./scripts/huggingface/hf-dataset-cleanup.sh
```

### force-hf-rebuild.sh
Force complete rebuild and reupload (clears cache).

**Usage:**
```bash
./scripts/huggingface/force-hf-rebuild.sh
```

## Workflow

1. Setup: `setup-huggingface.sh`
2. Check config: `check-hf-vars.py`
3. Prepare data: `reorganize_for_huggingface.py`
4. Finalize: `finalize_huggingface_structure.py`
5. Upload: `upload_to_huggingface.py`
6. Deploy: `deploy-huggingface.sh`

## Environment Variables

Required in `.env`:
```bash
HF_ORGANIZATION=CommunityOne
HF_USERNAME=CommunityOne
HF_TOKEN=hf_...
```