thbndi commited on
Commit
aa3ada1
·
verified ·
1 Parent(s): f25ad53

Initial MoveTSA dataset builder (gated Gradio app)

Browse files
Files changed (4) hide show
  1. README.md +44 -7
  2. __pycache__/app.cpython-314.pyc +0 -0
  3. app.py +114 -0
  4. requirements.txt +9 -0
README.md CHANGED
@@ -1,13 +1,50 @@
1
  ---
2
- title: Movetsa Builder
3
- emoji: 🐢
4
- colorFrom: yellow
5
- colorTo: gray
6
  sdk: gradio
7
- sdk_version: 6.17.3
8
- python_version: '3.13'
9
  app_file: app.py
10
  pinned: false
 
 
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: MoveTSA Dataset Builder
3
+ emoji: 🚗
4
+ colorFrom: blue
5
+ colorTo: indigo
6
  sdk: gradio
7
+ sdk_version: 5.9.1
 
8
  app_file: app.py
9
  pinned: false
10
+ hf_oauth: true
11
+ short_description: Generate parametrised MoveTSA windows, no raw access
12
  ---
13
 
14
+ # MoveTSA dataset builder
15
+
16
+ A gated Gradio Space that generates a **parametrised** MoveTSA windows parquet
17
+ (HRV / BSI / ECG-derived respiration / simulator aggregates / DATEX / subjective
18
+ labels) on demand — **without** giving users access to the raw recordings.
19
+
20
+ ## How it works
21
+
22
+ - The raw recordings live in the **private** dataset `thbndi/movetsa-raw` and are
23
+ downloaded into the Space at startup. They never leave the Space.
24
+ - The pipeline code is pulled from `thbndi/MoveTSA` and imported as the
25
+ `MoveTSA` package.
26
+ - Authorised users sign in with Hugging Face, pick `window_size` / `overlap` /
27
+ `normalize` / baselines / familiarization, and download **only the generated
28
+ parquet**.
29
+
30
+ ## Access control (gated)
31
+
32
+ The Space is public, but generation requires:
33
+ 1. signing in with Hugging Face (`hf_oauth`), and
34
+ 2. being listed in the `ALLOWLIST` Space variable (comma-separated usernames).
35
+
36
+ Non-listed users are told to request access. Add a username by editing the
37
+ `ALLOWLIST` variable in **Settings → Variables and secrets** — no redeploy of
38
+ code needed.
39
+
40
+ ## Configuration (Settings → Variables and secrets)
41
+
42
+ | Name | Kind | Purpose |
43
+ |------|------|---------|
44
+ | `HF_TOKEN` | **secret** | Fine-grained **read** token with access to `thbndi/movetsa-raw` and `thbndi/MoveTSA`. |
45
+ | `ALLOWLIST` | variable | Comma-separated HF usernames allowed to generate (default: `thbndi`). |
46
+ | `RAW_REPO` | variable (optional) | Raw dataset id (default `thbndi/movetsa-raw`). |
47
+ | `CODE_REPO` | variable (optional) | Pipeline-code dataset id (default `thbndi/MoveTSA`). |
48
+
49
+ > First load downloads ~13 GB of raw SIMU logs; expect a slow cold start. Enable
50
+ > **persistent storage** on the Space to avoid re-downloading on each restart.
__pycache__/app.cpython-314.pyc ADDED
Binary file (6.43 kB). View file
 
app.py ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """MoveTSA dataset builder — Hugging Face Space (gated via HF login + allowlist).
2
+
3
+ Generates a **parametrised** windows parquet (HRV / BSI / RSP / simulator
4
+ aggregates / DATEX / subjective labels) from the PRIVATE raw recordings,
5
+ entirely server-side. Authorised users only ever download the generated
6
+ parquet — they never get access to the raw ECG/SIMU files.
7
+
8
+ Backing repos (private, read with the ``HF_TOKEN`` Space secret):
9
+ - ``thbndi/movetsa-raw`` : raw recordings (stay inside the Space)
10
+ - ``thbndi/MoveTSA`` : pipeline code (downloaded at startup, importable
11
+ as the ``MoveTSA`` package)
12
+
13
+ Access control: the Space is public, but generation requires signing in with
14
+ Hugging Face and being on the allowlist (the ``ALLOWLIST`` Space variable, a
15
+ comma-separated list of usernames).
16
+ """
17
+
18
+ import os
19
+ import sys
20
+ import tempfile
21
+
22
+ import gradio as gr
23
+ from huggingface_hub import snapshot_download
24
+
25
+ TOKEN = os.environ.get("HF_TOKEN")
26
+ CODE_REPO = os.environ.get("CODE_REPO", "thbndi/MoveTSA")
27
+ RAW_REPO = os.environ.get("RAW_REPO", "thbndi/movetsa-raw")
28
+ OWNER = os.environ.get("OWNER_HANDLE", "thbndi")
29
+
30
+ # Comma-separated HF usernames allowed to generate. Edit via the ALLOWLIST
31
+ # Space *variable* (Settings → Variables and secrets) — no code change needed.
32
+ ALLOWLIST = {
33
+ u.strip().lower()
34
+ for u in os.environ.get("ALLOWLIST", OWNER).split(",")
35
+ if u.strip()
36
+ }
37
+
38
+ # ----------------------------------------------------------------- startup
39
+ # 1) Pipeline code → importable as the `MoveTSA` package (cwd on sys.path).
40
+ snapshot_download(CODE_REPO, repo_type="dataset", token=TOKEN,
41
+ local_dir="MoveTSA", allow_patterns=["*.py", "*.yaml"])
42
+ sys.path.insert(0, os.getcwd())
43
+
44
+ # 2) Raw recordings — private, kept inside the Space, never served to users.
45
+ RAW_DIR = snapshot_download(RAW_REPO, repo_type="dataset", token=TOKEN,
46
+ allow_patterns=["S*/**", "subjective_scores.csv"])
47
+
48
+ from MoveTSA.export_hf_dataset import build_windows # noqa: E402 (needs sys.path)
49
+
50
+
51
+ def generate(window_size, overlap, normalize, include_baselines,
52
+ include_familiarization, profile: gr.OAuthProfile | None):
53
+ """Run the pipeline with the chosen parameters and return a parquet file."""
54
+ if profile is None:
55
+ return None, "🔒 Connecte-toi avec ton compte Hugging Face pour générer."
56
+ if profile.username.lower() not in ALLOWLIST:
57
+ return None, (
58
+ f"⛔ Accès non accordé pour **@{profile.username}**.\n\n"
59
+ f"Demande l'accès à **@{OWNER}** (ajout à l'allowlist)."
60
+ )
61
+
62
+ df = build_windows(
63
+ RAW_DIR,
64
+ window_size=int(window_size),
65
+ window_overlap=float(overlap),
66
+ normalize=(None if normalize == "none" else normalize),
67
+ include_baselines=bool(include_baselines),
68
+ include_familiarization=bool(include_familiarization),
69
+ verbose=False,
70
+ )
71
+
72
+ fname = (f"movetsa_w{int(window_size)}_ov{int(float(overlap) * 100)}"
73
+ f"_{normalize}.parquet")
74
+ out = os.path.join(tempfile.mkdtemp(), fname)
75
+ df.to_parquet(out, index=False)
76
+ msg = (f"✅ **{len(df)} fenêtres × {df.shape[1]} colonnes** "
77
+ f"({df['subject'].nunique()} sujets) — `{fname}`")
78
+ return out, msg
79
+
80
+
81
+ with gr.Blocks(title="MoveTSA dataset builder") as demo:
82
+ gr.Markdown(
83
+ "# 🚗 MoveTSA dataset builder\n"
84
+ "Génère un parquet **HRV / BSI / RSP / simulateur** paramétré à partir "
85
+ "des enregistrements bruts (privés). Tu télécharges **uniquement** le "
86
+ "parquet généré — jamais les données brutes.\n\n"
87
+ "1. Connecte-toi avec Hugging Face. 2. Règle les paramètres. 3. Génère."
88
+ )
89
+ gr.LoginButton()
90
+ with gr.Row():
91
+ with gr.Column():
92
+ window_size = gr.Slider(15, 180, value=60, step=5,
93
+ label="window_size (s)")
94
+ overlap = gr.Slider(0.0, 0.9, value=0.5, step=0.05, label="overlap")
95
+ normalize = gr.Dropdown(["zscore", "center", "none"],
96
+ value="zscore", label="normalize")
97
+ include_baselines = gr.Checkbox(
98
+ value=True, label="inclure les baselines (B1–B4)")
99
+ include_familiarization = gr.Checkbox(
100
+ value=True, label="inclure la familiarisation (F)")
101
+ btn = gr.Button("Générer le parquet", variant="primary")
102
+ with gr.Column():
103
+ status = gr.Markdown()
104
+ out_file = gr.File(label="parquet généré")
105
+
106
+ btn.click(
107
+ generate,
108
+ inputs=[window_size, overlap, normalize, include_baselines,
109
+ include_familiarization],
110
+ outputs=[out_file, status],
111
+ )
112
+
113
+ if __name__ == "__main__":
114
+ demo.launch()
requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ huggingface_hub>=0.25
2
+ pandas
3
+ numpy
4
+ scipy
5
+ mne
6
+ neurokit2
7
+ pyyaml
8
+ tqdm
9
+ pyarrow