File size: 1,976 Bytes
f25ad53
aa3ada1
 
 
f25ad53
67687fc
f25ad53
 
aa3ada1
 
f25ad53
 
aa3ada1
 
 
 
 
 
 
 
b3b050b
aa3ada1
b3b050b
aa3ada1
 
 
 
 
 
 
 
 
 
 
b3b050b
 
 
aa3ada1
 
 
 
 
b3b050b
aa3ada1
b3b050b
 
aa3ada1
b3b050b
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
title: MoveTSA Dataset Builder
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
hf_oauth: true
short_description: Generate parametrised MoveTSA windows, no raw access
---

# MoveTSA dataset builder

A gated Gradio Space that generates a **parametrised** MoveTSA windows parquet
(HRV / BSI / ECG-derived respiration / simulator aggregates / DATEX / subjective
labels) on demand — **without** giving users access to the raw recordings.

## How it works

- The raw recordings live in the **private** dataset `MoveTSA/movetsa-raw` and are
  downloaded into the Space at startup. They never leave the Space.
- The pipeline code is pulled from `MoveTSA/MoveTSA` and imported as the
  `MoveTSA` package.
- Authorised users sign in with Hugging Face, pick `window_size` / `overlap` /
  `normalize` / baselines / familiarization, and download **only the generated
  parquet**.

## Access control (gated)

The Space is public, but generation requires:
1. signing in with Hugging Face (`hf_oauth`), and
2. being listed in the `ALLOWLIST` Space variable (comma-separated usernames).

Non-listed users are told to request access from `@thbndi`. Add a username by
editing the `ALLOWLIST` variable in **Settings → Variables and secrets** — no
redeploy of code needed.

## Configuration (Settings → Variables and secrets)

| Name | Kind | Purpose |
|------|------|---------|
| `HF_TOKEN` | **secret** | Fine-grained **read** token with access to `MoveTSA/movetsa-raw` and `MoveTSA/MoveTSA`. |
| `ALLOWLIST` | variable | Comma-separated HF usernames allowed to generate (default: `thbndi`). |
| `RAW_REPO` | variable (optional) | Raw dataset id (default `MoveTSA/movetsa-raw`). |
| `CODE_REPO` | variable (optional) | Pipeline-code dataset id (default `MoveTSA/MoveTSA`). |

> First load downloads ~13 GB of raw SIMU logs; expect a slow cold start. Enable
> **persistent storage** on the Space to avoid re-downloading on each restart.