Spaces:
Running
Running
| title: TDB Intake | |
| emoji: π¬ | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: streamlit | |
| sdk_version: "1.39.0" | |
| app_file: app.py | |
| pinned: false | |
| # Trial Design Benchmark β Intake | |
| A Streamlit intake form for trial statisticians. Submissions are saved to a **Hugging Face Dataset** repo. An **Admin** page (in the sidebar) lets reviewers triage submissions (`pending` / `reviewed` / `needs_fix`). | |
| ## What it does | |
| - **Form (`app.py`)** β statisticians enter `trial_id`, `username`, and a list of questions. Each question has: | |
| - `design_element` (dropdown β when "Others" is picked, a free-text input appears) | |
| - `question_type` (dropdown β `extraction_only` / `derivation_required`) | |
| - `question` (free text) | |
| - **Rubrics** auto-generated by question type: | |
| - `extraction_only` β 1 rubric: `output.json` | |
| - `derivation_required` β 4 rubrics: `output.json` Γ {Inputs used, Calculated value, Method} + `output.R` Γ {Reproducibility} | |
| - Each rubric collects `points`, `tolerance`, `criterion`. | |
| - **Versions** β every Submit saves a new version. Re-enter the same `trial_id` + `username`, click **Find versions**, pick one, and **Load selected version** to pull it back into the form for editing; Submit then saves a new version. | |
| - **Admin page (`pages/1_Admin.py`)** β password-gated review console. Shows **only the latest version of each trial** (one row per `trial_id` + `username`). The questionnaire is rendered in the same layout as the form (read-only). Reviewers can add reviews **per question** *and* an overall review; review history covers **all versions** (each review tagged with its version, and per-question reviews tied to their question). The trial's current status reflects the latest version's most recent overall review. Each review is its own file under `reviews/<trial>__<user>/<version>/`. (Submitters can still see and load all their own versions on the form.) | |
| ## Run locally | |
| ```bash | |
| python -m venv .venv && source .venv/bin/activate | |
| pip install -r requirements.txt | |
| streamlit run app.py | |
| ``` | |
| Without HF env vars set, submissions land in `./data/submissions/<...>.json` on disk β fine for dev. | |
| ## Deploy on Hugging Face Spaces | |
| ### 1. Create a private HF Dataset repo | |
| - Sign in at <https://huggingface.co> | |
| - Click your avatar β **New Dataset** | |
| - Owner: your username (e.g. `ttt-77`) | |
| - Name: e.g. `tdb-intake-submissions` | |
| - Visibility: **Private** | |
| - Create. Leave it empty. | |
| ### 2. Generate an HF access token | |
| - <https://huggingface.co/settings/tokens> β **New token** | |
| - Token type: **Write** | |
| - Save the `hf_...` string. | |
| ### 3. Create the Space | |
| - Click your avatar β **New Space** | |
| - Name: e.g. `tdb-intake` | |
| - SDK: **Streamlit** | |
| - Visibility: your choice (public works; the *form* is intended for public submission, only *data* needs to be private) | |
| - Create β HF gives you a git repo URL. | |
| ### 4. Push this code to the Space | |
| ```bash | |
| git remote add hf https://huggingface.co/spaces/<your-username>/tdb-intake | |
| git push hf main | |
| ``` | |
| Or, in the HF Space's **Settings β Repository**, link this GitHub repo and HF will auto-sync on push. | |
| ### 5. Add Space secrets | |
| In the Space β **Settings β Variables and secrets** β add as **secrets**: | |
| | Name | Value | | |
| | --- | --- | | |
| | `HF_TOKEN` | the token from step 2 | | |
| | `HF_DATASET_REPO` | `<your-username>/tdb-intake-submissions` | | |
| | `HF_DATASET_BRANCH` | `main` (optional, defaults to `main`) | | |
| | `ADMIN_PASSWORD` | a password to share with reviewers | | |
| The Space will restart automatically and pick up the new secrets. | |
| ### 6. Test | |
| - Open the Space URL β fill the form β **Submit**. A file lands in `submissions/<trial_id>__<username>/<stamp>.json` in the dataset repo. Submitting again saves another version in the same folder. | |
| - Open the **Admin** page (left sidebar) β enter password β see the submission with status `pending` β add a review (your name + status + comment). It appears in the review timeline and a new file lands under `reviews/<submission>/`. Add more reviews to build up the history. | |
| ## Dataset layout | |
| Every submit saves a **new version** under a per-pair folder β nothing is | |
| overwritten, so the full version history is kept and any version can be loaded | |
| back. Each review is a **separate file** keyed to a specific version, so a | |
| version can be reviewed many times by different people and concurrent reviews | |
| never conflict. | |
| ```text | |
| submissions/<trial>__<user>/<stamp>.json # one file per version | |
| reviews/<trial>__<user>/<stamp>/<revstamp>__<rev>.json # one file per review of that version | |
| ``` | |
| To load/edit a previous version: on the form, enter the same `trial_id` + | |
| `username`, click **Find versions**, pick a version, click **Load selected | |
| version**, edit, then **Submit** (which saves a new version). | |
| ### Submission file (`submissions/<trial>__<user>/<stamp>.json`) | |
| ```json | |
| { | |
| "submissionId": "submissions/NCT0001__jdoe/2026-06-04T...Z.json", | |
| "version": "2026-06-04T...Z", | |
| "submittedAt": "2026-06-04T...", | |
| "trial_id": "NCT0001", | |
| "username": "jdoe", | |
| "comparison": { | |
| "trial_id": "NCT0001", | |
| "username": "jdoe", | |
| "prompts": [ | |
| { | |
| "id": "P-001", | |
| "design_element": "Sample size and power", | |
| "design_element_other": "", | |
| "question": "Total target PFS events", | |
| "question_type": "derivation_required", | |
| "rubrics": [ | |
| {"artifact": "output.json", "dimension": "Inputs used", "points": "5", "criterion": "...", "tolerance": "..."}, | |
| {"artifact": "output.json", "dimension": "Calculated value", "points": "5", "criterion": "...", "tolerance": "Β±5%"}, | |
| {"artifact": "output.json", "dimension": "Method", "points": "5", "criterion": "...", "tolerance": "..."}, | |
| {"artifact": "output.R", "dimension": "Reproducibility", "points": "5", "criterion": "...", "tolerance": "..."} | |
| ] | |
| } | |
| ] | |
| } | |
| } | |
| ``` | |
| ### Review file (`reviews/<trial>__<user>/<stamp>/*.json`) | |
| ```json | |
| { | |
| "submissionId": "submissions/NCT0001__jdoe/2026-06-04T...Z.json", | |
| "at": "2026-06-04T16:00:00+00:00", | |
| "reviewer": "Dr. Lee", | |
| "status": "needs_fix", | |
| "note": "still missing the power assumption", | |
| "question_id": "P-002" | |
| } | |
| ``` | |
| `question_id` ties the review to a specific question; an empty `question_id` | |
| means an overall (whole-version) review. The trial's **current status** is the | |
| most recent *overall* review on the latest version (or `pending` if none). | |
| ### Load everything in Python | |
| ```python | |
| from huggingface_hub import snapshot_download | |
| import json, glob, os | |
| local = snapshot_download("ttt-77/tdb-intake-submissions", repo_type="dataset") | |
| # every version: submissions/<trial>__<user>/<stamp>.json | |
| submissions = [json.load(open(f)) for f in glob.glob(f"{local}/submissions/*/*.json")] | |
| # reviews: reviews/<trial>__<user>/<stamp>/<revstamp>__<rev>.json | |
| # key = "<trial>__<user>/<stamp>" (matches a submission's submissionId minus prefix/suffix) | |
| reviews = {} | |
| for f in glob.glob(f"{local}/reviews/*/*/*.json"): | |
| pair, ver = f.split("/reviews/")[1].split("/")[:2] | |
| reviews.setdefault(f"{pair}/{ver}", []).append(json.load(open(f))) | |
| for key in reviews: | |
| reviews[key].sort(key=lambda r: r["at"]) # oldest first | |
| ``` | |
| ## Project structure | |
| ```text | |
| . | |
| βββ app.py # main intake form (entry point for HF Space) | |
| βββ pages/ | |
| β βββ 1_Admin.py # admin review page (shown in sidebar) | |
| βββ lib/ | |
| β βββ __init__.py | |
| β βββ schema.py # constants, defaults, validators | |
| β βββ storage.py # HF Dataset I/O + local fs fallback + admin password check | |
| βββ requirements.txt | |
| βββ README.md | |
| ``` | |
| ## Privacy notes | |
| - The dataset repo should be **private**. | |
| - `HF_TOKEN` and `ADMIN_PASSWORD` live only in Space secrets β never commit them. | |
| - Rotate the token periodically. | |
| ## Extending with Python ML libs | |
| Adding NLP / model checks is now a few lines in `lib/`. Examples: | |
| - `spaCy` for entity extraction on submitted SAP excerpts | |
| - `sentence-transformers` for semantic dedup of similar questions | |
| - `huggingface_hub.InferenceClient` for LLM-as-judge on the criterion text | |
| - `pandas` directly in the admin page for batch stats / CSV export | |