--- title: MoveTSA Dataset Builder colorFrom: blue colorTo: indigo sdk: gradio sdk_version: 5.49.1 app_file: app.py pinned: false hf_oauth: true short_description: Generate parametrised MoveTSA windows, no raw access --- # MoveTSA dataset builder A gated Gradio Space that generates a **parametrised** MoveTSA windows parquet (HRV / BSI / ECG-derived respiration / simulator aggregates / DATEX / subjective labels) on demand — **without** giving users access to the raw recordings. ## How it works - The raw recordings live in the **private** dataset `MoveTSA/movetsa-raw` and are downloaded into the Space at startup. They never leave the Space. - The pipeline code is pulled from `MoveTSA/MoveTSA` and imported as the `MoveTSA` package. - Authorised users sign in with Hugging Face, pick `window_size` / `overlap` / `normalize` / baselines / familiarization, and download **only the generated parquet**. ## Access control (gated) The Space is public, but generation requires: 1. signing in with Hugging Face (`hf_oauth`), and 2. being listed in the `ALLOWLIST` Space variable (comma-separated usernames). Non-listed users are told to request access from `@thbndi`. Add a username by editing the `ALLOWLIST` variable in **Settings → Variables and secrets** — no redeploy of code needed. ## Configuration (Settings → Variables and secrets) | Name | Kind | Purpose | |------|------|---------| | `HF_TOKEN` | **secret** | Fine-grained **read** token with access to `MoveTSA/movetsa-raw` and `MoveTSA/MoveTSA`. | | `ALLOWLIST` | variable | Comma-separated HF usernames allowed to generate (default: `thbndi`). | | `RAW_REPO` | variable (optional) | Raw dataset id (default `MoveTSA/movetsa-raw`). | | `CODE_REPO` | variable (optional) | Pipeline-code dataset id (default `MoveTSA/MoveTSA`). | > First load downloads ~13 GB of raw SIMU logs; expect a slow cold start. Enable > **persistent storage** on the Space to avoid re-downloading on each restart.