Iridium-193 commited on
Commit
fe24273
Β·
verified Β·
1 Parent(s): 49dd243

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. HF_SPACE_GUIDE.md +109 -0
  2. README.md +1 -0
  3. requirements.txt +2 -2
HF_SPACE_GUIDE.md ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # HF Space Deployment & Dataset Guide
2
+
3
+ ## Files to Upload to HF Space
4
+
5
+ ```
6
+ your-hf-space/
7
+ β”œβ”€β”€ README.md # HF Space metadata (YAML header)
8
+ β”œβ”€β”€ app.py # Main application
9
+ β”œβ”€β”€ requirements.txt # Python dependencies
10
+ β”œβ”€β”€ finetuned_best.pth # Model checkpoint
11
+ └── src/
12
+ β”œβ”€β”€ data_collection.py # Runtime dependency
13
+ └── collection_common.py # Runtime dependency
14
+ ```
15
+
16
+ ## Space README.md
17
+
18
+ The `README.md` on the HF Space repo **must** start with this YAML frontmatter:
19
+
20
+ ```yaml
21
+ ---
22
+ title: Soil Texture Classification
23
+ emoji: 🌍
24
+ colorFrom: brown
25
+ colorTo: green
26
+ sdk: gradio
27
+ sdk_version: "4.44.0"
28
+ app_file: app.py
29
+ pinned: false
30
+ ---
31
+ ```
32
+
33
+ This is separate from the local project `README.md`.
34
+
35
+ ## Space Secrets
36
+
37
+ Set these in **Space Settings β†’ Repository secrets**:
38
+
39
+ | Secret | Value | Purpose |
40
+ |---|---|---|
41
+ | `HF_TOKEN` | Your HF write-access token | Auth for dataset upload |
42
+ | `HF_CONTRIB_DATASET_REPO` | `your-username/soil-submissions` | Private dataset repo to receive exported data |
43
+ | `CONTRIBUTION_DATA_DIR` | `/data/community_submissions` | Uses HF persistent storage (survives restarts) |
44
+
45
+ **Enable persistent storage** in Space Settings so uploaded data survives container restarts.
46
+
47
+ ## How Data Flows
48
+
49
+ 1. Users upload images + labels via **Contribute Data** or **Dataset Management** tabs.
50
+ 2. Data is saved to persistent storage at `CONTRIBUTION_DATA_DIR`.
51
+ 3. The Space auto-exports submission bundles to `HF_CONTRIB_DATASET_REPO` at **23:50 UTC daily** or when disk usage exceeds 90%.
52
+ 4. Owner downloads from the private dataset repo (see below).
53
+
54
+ The download mechanism is invisible to users β€” no download UI is exposed.
55
+
56
+ ## How to Download Uploaded Dataset
57
+
58
+ First, create a **private dataset repo** on HF (e.g., `your-username/soil-submissions`).
59
+
60
+ ### Option A: Python script (recommended)
61
+
62
+ ```python
63
+ from huggingface_hub import snapshot_download
64
+
65
+ snapshot_download(
66
+ repo_id="your-username/soil-submissions",
67
+ repo_type="dataset",
68
+ local_dir="./downloaded_data",
69
+ token="hf_YOUR_TOKEN",
70
+ )
71
+ ```
72
+
73
+ ### Option B: HF CLI one-liner
74
+
75
+ ```bash
76
+ huggingface-cli download your-username/soil-submissions \
77
+ --repo-type dataset \
78
+ --local-dir ./downloaded_data \
79
+ --token hf_YOUR_TOKEN
80
+ ```
81
+
82
+ ### Option C: Full sync + curation pipeline
83
+
84
+ Downloads, deduplicates, and curates data into a train-ready format:
85
+
86
+ ```bash
87
+ pip install pandas # extra dependency for this script
88
+
89
+ python src/sync_space_data.py \
90
+ --dataset_repo your-username/soil-submissions \
91
+ --date 2026-02-14
92
+ ```
93
+
94
+ Outputs curated data to `data/labeled/community/` ready for training.
95
+
96
+ ### Option D: HF Web UI
97
+
98
+ Go to `https://huggingface.co/datasets/your-username/soil-submissions` and download files directly from the browser.
99
+
100
+ ## Optional Environment Variables
101
+
102
+ | Variable | Default | Description |
103
+ |---|---|---|
104
+ | `CONTRIBUTION_DAILY_EXPORT_HOUR_UTC` | `23` | Hour (UTC) for daily auto-export |
105
+ | `CONTRIBUTION_DAILY_EXPORT_MINUTE_UTC` | `50` | Minute (UTC) for daily auto-export |
106
+ | `CONTRIBUTION_MAX_USAGE_PERCENT` | `90` | Disk usage % that triggers immediate export |
107
+ | `CONTRIBUTION_DEDUPLICATE_IMAGES` | `1` | Deduplicate identical images by hash |
108
+ | `CONTRIBUTION_PRUNE_AFTER_EXPORT` | `0` | Delete local data after successful export |
109
+ | `CONTRIBUTION_STORAGE_QUOTA_BYTES` | `0` | Custom storage quota (0 = use disk total) |
README.md CHANGED
@@ -5,6 +5,7 @@ colorFrom: yellow
5
  colorTo: green
6
  sdk: gradio
7
  sdk_version: "4.44.0"
 
8
  app_file: app.py
9
  pinned: false
10
  ---
 
5
  colorTo: green
6
  sdk: gradio
7
  sdk_version: "4.44.0"
8
+ python_version: "3.10"
9
  app_file: app.py
10
  pinned: false
11
  ---
requirements.txt CHANGED
@@ -12,7 +12,7 @@ opencv-python-headless>=4.8.0
12
  matplotlib>=3.7.0
13
 
14
  # WebUI - Required for Gradio interface
15
- gradio>=4.0.0
16
 
17
  # Hub sync/export (Space -> Dataset)
18
- huggingface_hub>=0.26.0
 
12
  matplotlib>=3.7.0
13
 
14
  # WebUI - Required for Gradio interface
15
+ gradio==4.44.0
16
 
17
  # Hub sync/export (Space -> Dataset)
18
+ huggingface_hub>=0.26.0,<1.0.0