movetsa-builder / README.md
thbndi's picture
Update README.md
b3b050b verified
|
Raw
History Blame Contribute Delete
1.98 kB
---
title: MoveTSA Dataset Builder
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
hf_oauth: true
short_description: Generate parametrised MoveTSA windows, no raw access
---
# MoveTSA dataset builder
A gated Gradio Space that generates a **parametrised** MoveTSA windows parquet
(HRV / BSI / ECG-derived respiration / simulator aggregates / DATEX / subjective
labels) on demand β€” **without** giving users access to the raw recordings.
## How it works
- The raw recordings live in the **private** dataset `MoveTSA/movetsa-raw` and are
downloaded into the Space at startup. They never leave the Space.
- The pipeline code is pulled from `MoveTSA/MoveTSA` and imported as the
`MoveTSA` package.
- Authorised users sign in with Hugging Face, pick `window_size` / `overlap` /
`normalize` / baselines / familiarization, and download **only the generated
parquet**.
## Access control (gated)
The Space is public, but generation requires:
1. signing in with Hugging Face (`hf_oauth`), and
2. being listed in the `ALLOWLIST` Space variable (comma-separated usernames).
Non-listed users are told to request access from `@thbndi`. Add a username by
editing the `ALLOWLIST` variable in **Settings β†’ Variables and secrets** β€” no
redeploy of code needed.
## Configuration (Settings β†’ Variables and secrets)
| Name | Kind | Purpose |
|------|------|---------|
| `HF_TOKEN` | **secret** | Fine-grained **read** token with access to `MoveTSA/movetsa-raw` and `MoveTSA/MoveTSA`. |
| `ALLOWLIST` | variable | Comma-separated HF usernames allowed to generate (default: `thbndi`). |
| `RAW_REPO` | variable (optional) | Raw dataset id (default `MoveTSA/movetsa-raw`). |
| `CODE_REPO` | variable (optional) | Pipeline-code dataset id (default `MoveTSA/MoveTSA`). |
> First load downloads ~13 GB of raw SIMU logs; expect a slow cold start. Enable
> **persistent storage** on the Space to avoid re-downloading on each restart.