Spaces:
Running
A newer version of the Streamlit SDK is available: 1.58.0
title: TDB Intake
emoji: π¬
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.39.0
app_file: app.py
pinned: false
Trial Design Benchmark β Intake
A Streamlit intake form for trial statisticians. Submissions are saved to a Hugging Face Dataset repo. An Admin page (in the sidebar) lets reviewers triage submissions (pending / reviewed / needs_fix).
What it does
- Form (
app.py) β statisticians entertrial_id,username, and a list of questions. Each question has:design_element(dropdown β when "Others" is picked, a free-text input appears)question_type(dropdown βextraction_only/derivation_required)question(free text)- Rubrics auto-generated by question type:
extraction_onlyβ 1 rubric:output.jsonderivation_requiredβ 4 rubrics:output.jsonΓ {Inputs used, Calculated value, Method} +output.RΓ {Reproducibility}
- Each rubric collects
points,tolerance,criterion. - Versions β every Submit saves a new version. Re-enter the same
trial_id+username, click Find versions, pick one, and Load selected version to pull it back into the form for editing; Submit then saves a new version.
- Admin page (
pages/1_Admin.py) β password-gated review console. Shows only the latest version of each trial (one row pertrial_id+username). The questionnaire is rendered in the same layout as the form (read-only). Reviewers can add reviews per question and an overall review; review history covers all versions (each review tagged with its version, and per-question reviews tied to their question). The trial's current status reflects the latest version's most recent overall review. Each review is its own file underreviews/<trial>__<user>/<version>/. (Submitters can still see and load all their own versions on the form.)
Run locally
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
streamlit run app.py
Without HF env vars set, submissions land in ./data/submissions/<...>.json on disk β fine for dev.
Deploy on Hugging Face Spaces
1. Create a private HF Dataset repo
- Sign in at https://huggingface.co
- Click your avatar β New Dataset
- Owner: your username (e.g.
ttt-77) - Name: e.g.
tdb-intake-submissions - Visibility: Private
- Create. Leave it empty.
2. Generate an HF access token
- https://huggingface.co/settings/tokens β New token
- Token type: Write
- Save the
hf_...string.
3. Create the Space
- Click your avatar β New Space
- Name: e.g.
tdb-intake - SDK: Streamlit
- Visibility: your choice (public works; the form is intended for public submission, only data needs to be private)
- Create β HF gives you a git repo URL.
4. Push this code to the Space
git remote add hf https://huggingface.co/spaces/<your-username>/tdb-intake
git push hf main
Or, in the HF Space's Settings β Repository, link this GitHub repo and HF will auto-sync on push.
5. Add Space secrets
In the Space β Settings β Variables and secrets β add as secrets:
| Name | Value |
|---|---|
HF_TOKEN |
the token from step 2 |
HF_DATASET_REPO |
<your-username>/tdb-intake-submissions |
HF_DATASET_BRANCH |
main (optional, defaults to main) |
ADMIN_PASSWORD |
a password to share with reviewers |
The Space will restart automatically and pick up the new secrets.
6. Test
- Open the Space URL β fill the form β Submit. A file lands in
submissions/<trial_id>__<username>/<stamp>.jsonin the dataset repo. Submitting again saves another version in the same folder. - Open the Admin page (left sidebar) β enter password β see the submission with status
pendingβ add a review (your name + status + comment). It appears in the review timeline and a new file lands underreviews/<submission>/. Add more reviews to build up the history.
Dataset layout
Every submit saves a new version under a per-pair folder β nothing is overwritten, so the full version history is kept and any version can be loaded back. Each review is a separate file keyed to a specific version, so a version can be reviewed many times by different people and concurrent reviews never conflict.
submissions/<trial>__<user>/<stamp>.json # one file per version
reviews/<trial>__<user>/<stamp>/<revstamp>__<rev>.json # one file per review of that version
To load/edit a previous version: on the form, enter the same trial_id +
username, click Find versions, pick a version, click Load selected
version, edit, then Submit (which saves a new version).
Submission file (submissions/<trial>__<user>/<stamp>.json)
{
"submissionId": "submissions/NCT0001__jdoe/2026-06-04T...Z.json",
"version": "2026-06-04T...Z",
"submittedAt": "2026-06-04T...",
"trial_id": "NCT0001",
"username": "jdoe",
"comparison": {
"trial_id": "NCT0001",
"username": "jdoe",
"prompts": [
{
"id": "P-001",
"design_element": "Sample size and power",
"design_element_other": "",
"question": "Total target PFS events",
"question_type": "derivation_required",
"rubrics": [
{"artifact": "output.json", "dimension": "Inputs used", "points": "5", "criterion": "...", "tolerance": "..."},
{"artifact": "output.json", "dimension": "Calculated value", "points": "5", "criterion": "...", "tolerance": "Β±5%"},
{"artifact": "output.json", "dimension": "Method", "points": "5", "criterion": "...", "tolerance": "..."},
{"artifact": "output.R", "dimension": "Reproducibility", "points": "5", "criterion": "...", "tolerance": "..."}
]
}
]
}
}
Review file (reviews/<trial>__<user>/<stamp>/*.json)
{
"submissionId": "submissions/NCT0001__jdoe/2026-06-04T...Z.json",
"at": "2026-06-04T16:00:00+00:00",
"reviewer": "Dr. Lee",
"status": "needs_fix",
"note": "still missing the power assumption",
"question_id": "P-002"
}
question_id ties the review to a specific question; an empty question_id
means an overall (whole-version) review. The trial's current status is the
most recent overall review on the latest version (or pending if none).
Load everything in Python
from huggingface_hub import snapshot_download
import json, glob, os
local = snapshot_download("ttt-77/tdb-intake-submissions", repo_type="dataset")
# every version: submissions/<trial>__<user>/<stamp>.json
submissions = [json.load(open(f)) for f in glob.glob(f"{local}/submissions/*/*.json")]
# reviews: reviews/<trial>__<user>/<stamp>/<revstamp>__<rev>.json
# key = "<trial>__<user>/<stamp>" (matches a submission's submissionId minus prefix/suffix)
reviews = {}
for f in glob.glob(f"{local}/reviews/*/*/*.json"):
pair, ver = f.split("/reviews/")[1].split("/")[:2]
reviews.setdefault(f"{pair}/{ver}", []).append(json.load(open(f)))
for key in reviews:
reviews[key].sort(key=lambda r: r["at"]) # oldest first
Project structure
.
βββ app.py # main intake form (entry point for HF Space)
βββ pages/
β βββ 1_Admin.py # admin review page (shown in sidebar)
βββ lib/
β βββ __init__.py
β βββ schema.py # constants, defaults, validators
β βββ storage.py # HF Dataset I/O + local fs fallback + admin password check
βββ requirements.txt
βββ README.md
Privacy notes
- The dataset repo should be private.
HF_TOKENandADMIN_PASSWORDlive only in Space secrets β never commit them.- Rotate the token periodically.
Extending with Python ML libs
Adding NLP / model checks is now a few lines in lib/. Examples:
spaCyfor entity extraction on submitted SAP excerptssentence-transformersfor semantic dedup of similar questionshuggingface_hub.InferenceClientfor LLM-as-judge on the criterion textpandasdirectly in the admin page for batch stats / CSV export