movetsa-builder / README.md
thbndi's picture
Update README.md
b3b050b verified
|
Raw
History Blame Contribute Delete
1.98 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade
metadata
title: MoveTSA Dataset Builder
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
hf_oauth: true
short_description: Generate parametrised MoveTSA windows, no raw access

MoveTSA dataset builder

A gated Gradio Space that generates a parametrised MoveTSA windows parquet (HRV / BSI / ECG-derived respiration / simulator aggregates / DATEX / subjective labels) on demand — without giving users access to the raw recordings.

How it works

  • The raw recordings live in the private dataset MoveTSA/movetsa-raw and are downloaded into the Space at startup. They never leave the Space.
  • The pipeline code is pulled from MoveTSA/MoveTSA and imported as the MoveTSA package.
  • Authorised users sign in with Hugging Face, pick window_size / overlap / normalize / baselines / familiarization, and download only the generated parquet.

Access control (gated)

The Space is public, but generation requires:

  1. signing in with Hugging Face (hf_oauth), and
  2. being listed in the ALLOWLIST Space variable (comma-separated usernames).

Non-listed users are told to request access from @thbndi. Add a username by editing the ALLOWLIST variable in Settings → Variables and secrets — no redeploy of code needed.

Configuration (Settings → Variables and secrets)

Name Kind Purpose
HF_TOKEN secret Fine-grained read token with access to MoveTSA/movetsa-raw and MoveTSA/MoveTSA.
ALLOWLIST variable Comma-separated HF usernames allowed to generate (default: thbndi).
RAW_REPO variable (optional) Raw dataset id (default MoveTSA/movetsa-raw).
CODE_REPO variable (optional) Pipeline-code dataset id (default MoveTSA/MoveTSA).

First load downloads ~13 GB of raw SIMU logs; expect a slow cold start. Enable persistent storage on the Space to avoid re-downloading on each restart.