Upload folder using huggingface_hub
Browse files- HF_SPACE_GUIDE.md +109 -0
- README.md +1 -0
- requirements.txt +2 -2
HF_SPACE_GUIDE.md
ADDED
|
@@ -0,0 +1,109 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# HF Space Deployment & Dataset Guide
|
| 2 |
+
|
| 3 |
+
## Files to Upload to HF Space
|
| 4 |
+
|
| 5 |
+
```
|
| 6 |
+
your-hf-space/
|
| 7 |
+
βββ README.md # HF Space metadata (YAML header)
|
| 8 |
+
βββ app.py # Main application
|
| 9 |
+
βββ requirements.txt # Python dependencies
|
| 10 |
+
βββ finetuned_best.pth # Model checkpoint
|
| 11 |
+
βββ src/
|
| 12 |
+
βββ data_collection.py # Runtime dependency
|
| 13 |
+
βββ collection_common.py # Runtime dependency
|
| 14 |
+
```
|
| 15 |
+
|
| 16 |
+
## Space README.md
|
| 17 |
+
|
| 18 |
+
The `README.md` on the HF Space repo **must** start with this YAML frontmatter:
|
| 19 |
+
|
| 20 |
+
```yaml
|
| 21 |
+
---
|
| 22 |
+
title: Soil Texture Classification
|
| 23 |
+
emoji: π
|
| 24 |
+
colorFrom: brown
|
| 25 |
+
colorTo: green
|
| 26 |
+
sdk: gradio
|
| 27 |
+
sdk_version: "4.44.0"
|
| 28 |
+
app_file: app.py
|
| 29 |
+
pinned: false
|
| 30 |
+
---
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
This is separate from the local project `README.md`.
|
| 34 |
+
|
| 35 |
+
## Space Secrets
|
| 36 |
+
|
| 37 |
+
Set these in **Space Settings β Repository secrets**:
|
| 38 |
+
|
| 39 |
+
| Secret | Value | Purpose |
|
| 40 |
+
|---|---|---|
|
| 41 |
+
| `HF_TOKEN` | Your HF write-access token | Auth for dataset upload |
|
| 42 |
+
| `HF_CONTRIB_DATASET_REPO` | `your-username/soil-submissions` | Private dataset repo to receive exported data |
|
| 43 |
+
| `CONTRIBUTION_DATA_DIR` | `/data/community_submissions` | Uses HF persistent storage (survives restarts) |
|
| 44 |
+
|
| 45 |
+
**Enable persistent storage** in Space Settings so uploaded data survives container restarts.
|
| 46 |
+
|
| 47 |
+
## How Data Flows
|
| 48 |
+
|
| 49 |
+
1. Users upload images + labels via **Contribute Data** or **Dataset Management** tabs.
|
| 50 |
+
2. Data is saved to persistent storage at `CONTRIBUTION_DATA_DIR`.
|
| 51 |
+
3. The Space auto-exports submission bundles to `HF_CONTRIB_DATASET_REPO` at **23:50 UTC daily** or when disk usage exceeds 90%.
|
| 52 |
+
4. Owner downloads from the private dataset repo (see below).
|
| 53 |
+
|
| 54 |
+
The download mechanism is invisible to users β no download UI is exposed.
|
| 55 |
+
|
| 56 |
+
## How to Download Uploaded Dataset
|
| 57 |
+
|
| 58 |
+
First, create a **private dataset repo** on HF (e.g., `your-username/soil-submissions`).
|
| 59 |
+
|
| 60 |
+
### Option A: Python script (recommended)
|
| 61 |
+
|
| 62 |
+
```python
|
| 63 |
+
from huggingface_hub import snapshot_download
|
| 64 |
+
|
| 65 |
+
snapshot_download(
|
| 66 |
+
repo_id="your-username/soil-submissions",
|
| 67 |
+
repo_type="dataset",
|
| 68 |
+
local_dir="./downloaded_data",
|
| 69 |
+
token="hf_YOUR_TOKEN",
|
| 70 |
+
)
|
| 71 |
+
```
|
| 72 |
+
|
| 73 |
+
### Option B: HF CLI one-liner
|
| 74 |
+
|
| 75 |
+
```bash
|
| 76 |
+
huggingface-cli download your-username/soil-submissions \
|
| 77 |
+
--repo-type dataset \
|
| 78 |
+
--local-dir ./downloaded_data \
|
| 79 |
+
--token hf_YOUR_TOKEN
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
### Option C: Full sync + curation pipeline
|
| 83 |
+
|
| 84 |
+
Downloads, deduplicates, and curates data into a train-ready format:
|
| 85 |
+
|
| 86 |
+
```bash
|
| 87 |
+
pip install pandas # extra dependency for this script
|
| 88 |
+
|
| 89 |
+
python src/sync_space_data.py \
|
| 90 |
+
--dataset_repo your-username/soil-submissions \
|
| 91 |
+
--date 2026-02-14
|
| 92 |
+
```
|
| 93 |
+
|
| 94 |
+
Outputs curated data to `data/labeled/community/` ready for training.
|
| 95 |
+
|
| 96 |
+
### Option D: HF Web UI
|
| 97 |
+
|
| 98 |
+
Go to `https://huggingface.co/datasets/your-username/soil-submissions` and download files directly from the browser.
|
| 99 |
+
|
| 100 |
+
## Optional Environment Variables
|
| 101 |
+
|
| 102 |
+
| Variable | Default | Description |
|
| 103 |
+
|---|---|---|
|
| 104 |
+
| `CONTRIBUTION_DAILY_EXPORT_HOUR_UTC` | `23` | Hour (UTC) for daily auto-export |
|
| 105 |
+
| `CONTRIBUTION_DAILY_EXPORT_MINUTE_UTC` | `50` | Minute (UTC) for daily auto-export |
|
| 106 |
+
| `CONTRIBUTION_MAX_USAGE_PERCENT` | `90` | Disk usage % that triggers immediate export |
|
| 107 |
+
| `CONTRIBUTION_DEDUPLICATE_IMAGES` | `1` | Deduplicate identical images by hash |
|
| 108 |
+
| `CONTRIBUTION_PRUNE_AFTER_EXPORT` | `0` | Delete local data after successful export |
|
| 109 |
+
| `CONTRIBUTION_STORAGE_QUOTA_BYTES` | `0` | Custom storage quota (0 = use disk total) |
|
README.md
CHANGED
|
@@ -5,6 +5,7 @@ colorFrom: yellow
|
|
| 5 |
colorTo: green
|
| 6 |
sdk: gradio
|
| 7 |
sdk_version: "4.44.0"
|
|
|
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
---
|
|
|
|
| 5 |
colorTo: green
|
| 6 |
sdk: gradio
|
| 7 |
sdk_version: "4.44.0"
|
| 8 |
+
python_version: "3.10"
|
| 9 |
app_file: app.py
|
| 10 |
pinned: false
|
| 11 |
---
|
requirements.txt
CHANGED
|
@@ -12,7 +12,7 @@ opencv-python-headless>=4.8.0
|
|
| 12 |
matplotlib>=3.7.0
|
| 13 |
|
| 14 |
# WebUI - Required for Gradio interface
|
| 15 |
-
gradio
|
| 16 |
|
| 17 |
# Hub sync/export (Space -> Dataset)
|
| 18 |
-
huggingface_hub>=0.26.0
|
|
|
|
| 12 |
matplotlib>=3.7.0
|
| 13 |
|
| 14 |
# WebUI - Required for Gradio interface
|
| 15 |
+
gradio==4.44.0
|
| 16 |
|
| 17 |
# Hub sync/export (Space -> Dataset)
|
| 18 |
+
huggingface_hub>=0.26.0,<1.0.0
|