Spaces:

build-small-hackathon
/

Fabella

Running

OpenCode commited on 20 days ago

Commit

d2ed801

1 Parent(s): 54b50db

docs: point all dataset references at build-small-hackathon/fabella-traces

The canonical public dataset is now at the hackathon org namespace,
public, with a real card and 5 seed rows. Update every reference in the
repo:

- README.md YAML: add 'datasets: - build-small-hackathon/fabella-traces'
so the dataset shows up in the Space's Datasets tab.
- README.md body + files list: 4 references moved from Kiy-K to the
build-small-hackathon namespace; the Sharing is Caring badge note now
reads as 'claimed' with a link to the live dataset.
- DATASET.md: default dataset path moved to the org namespace; the
Kiy-K personal-namespace mention is downgraded to a 'local dev
fallback' note.
- app.py: the 'trace_publication' payload the parent sees in their
Download my history bundle now points at the org dataset; the settings
dialog link in INDEX_HTML points at the org dataset; the anonymization
list gains a per-row UUID note.
- trace.py: DATASET_REPO default switched from Kiy-K/fabella-traces to
build-small-hackathon/fabella-traces so the live Space publishes to
the canonical repo on the Space's Datasets tab. The maker's personal
namespace is now a documented FABELLA_TRACE_REPO fallback for local
dev only.

No code behavior change. The per-row UUID publisher, the atomic probe,
the FABELLA_SHARE_TRACES=0 default, the per-parent self-export path,
and the rest of the architecture are unchanged.

Files changed (4) hide show

DATASET.md +9 -7
README.md +7 -5
app.py +4 -3
trace.py +13 -12

DATASET.md CHANGED Viewed

@@ -1,20 +1,22 @@
 # Fabella Anonymized Agent Traces
-**Note:** the public dataset was removed by the maker for this demo. Parents pull their own data at any time via **Settings → Download my history** in the running Space. The publisher (`trace.py`), the schema, and the dataset card on https://huggingface.co/datasets/Kiy-K/fabella-traces are all still in the repo for re-deployment. Set `FABELLA_SHARE_TRACES=1` on the Space to resume Hub publish.
-This is a local copy of the Hugging Face dataset card for Fabella's anonymized agent-trace publication. The live card is the source of truth; this file is a convenience for reviewers reading the source repo.
 ## Why this exists
 Fabella is a small Hugging Face Space that drafts short, kind, age-appropriate explanations a parent can use to talk to their child about a hard thing (a hospitalization, a move, a pet dying). The first version is generated by `google/gemma-4-E4B-it`; a `nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16` judge scores it against a six-criterion rubric; the drafter revises if needed. An optional read-aloud uses `openbmb/VoxCPM2`.
-When the public publisher is enabled, the dataset captures the full ReAct loop for every successful generation, with PII removed before the row leaves the Space. It's the **Sharing is Caring** merit-badge artifact for the [Build Small Hackathon](https://huggingface.co/spaces/build-small-hackathon/README) submission.
 ## Where it lives
-- **Live dataset (when enabled):** https://huggingface.co/datasets/Kiy-K/fabella-traces (default)
-- **Override env var:** `FABELLA_TRACE_REPO=build-small-hackathon/fabella-traces` (only works if an org admin pre-creates the dataset there — the Space's HF_TOKEN is contributor-level and cannot create org-namespace repos)
-- **Currently:** disabled by default for this demo; data lives in the per-parent bucket and is exportable via the Space's **Settings → Download my history** button.
 The dataset repo is created on first publish by `trace.py::_ensure_repo` via `HfApi.create_repo(exist_ok=True)`, so a fresh deployment does not require a one-time setup step.
@@ -80,7 +82,7 @@ The download path exists for two reasons: (1) the parent can verify that what we
 ## Opt-out
-- Set `FABELLA_SHARE_TRACES=0` on the Space to kill the publisher (this is the new default).
 - Or pass `share_trace=False` on `make_explanation` to opt out per request.
 The default for this demo is **off**: rows only live in the per-parent bucket, accessible via the Download button.

 # Fabella Anonymized Agent Traces
+When the public publisher is enabled, one anonymized row per request lands at
+[`build-small-hackathon/fabella-traces`](https://huggingface.co/datasets/build-small-hackathon/fabella-traces).
+The dataset repo is public and contains the live card, the publisher schema, and 5 seed rows for the data viewer. The Space publishes to it only when `FABELLA_SHARE_TRACES=1` is set on the Space (default `0`); with the flag off, rows only live in the per-parent bucket and are exportable via the Space's **Settings → Download my history** button.
+This file is a local copy of the Hugging Face dataset card for Fabella's anonymized agent-trace publication. The live card on the Hub is the source of truth; this file is a convenience for reviewers reading the source repo.
 ## Why this exists
 Fabella is a small Hugging Face Space that drafts short, kind, age-appropriate explanations a parent can use to talk to their child about a hard thing (a hospitalization, a move, a pet dying). The first version is generated by `google/gemma-4-E4B-it`; a `nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16` judge scores it against a six-criterion rubric; the drafter revises if needed. An optional read-aloud uses `openbmb/VoxCPM2`.
+The dataset captures the full ReAct loop for every successful generation, with PII removed before the row leaves the Space. It's the **Sharing is Caring** merit-badge artifact for the [Build Small Hackathon](https://huggingface.co/spaces/build-small-hackathon/README) submission.
 ## Where it lives
+- **Live dataset (when enabled):** https://huggingface.co/datasets/build-small-hackathon/fabella-traces (default)
+- **Personal-namespace fallback:** set `FABELLA_TRACE_REPO=Kiy-K/fabella-traces` to publish to the maker's personal dataset instead. Used for local dev where the Space's HF_TOKEN can't create in the org namespace.
+- **Default for this demo:** the publisher is OFF (`FABELLA_SHARE_TRACES=0`); data lives in the per-parent bucket and is exportable via the Space's **Settings → Download my history** button.
 The dataset repo is created on first publish by `trace.py::_ensure_repo` via `HfApi.create_repo(exist_ok=True)`, so a fresh deployment does not require a one-time setup step.
 ## Opt-out
+- Set `FABELLA_SHARE_TRACES=0` on the Space to kill the publisher (this is the default).
 - Or pass `share_trace=False` on `make_explanation` to opt out per request.
 The default for this demo is **off**: rows only live in the per-parent bucket, accessible via the Download button.

README.md CHANGED Viewed

@@ -10,6 +10,8 @@ pinned: true
 hf_oauth: true
 license: apache-2.0
 short_description: Small words for big questions.
 tags:
   - track:backyard
   - sponsor:openbmb
@@ -135,7 +137,7 @@ Three claimed, three skipped. Fabella's honest inventory:
 | Badge | Status | Why |
 |---|---|---|
 | **Off-Brand** 🎨 | Claimed | Custom HTML+CSS+JS frontend served by `gradio.Server` — zero default Gradio chrome. |
-| **Sharing is Caring** 📡 | Re-scoped | For this demo, the public dataset was removed by the maker. Parents pull their own data at any time via the **Download my history** button in the settings dialog (calls `GET /api/history/download` and returns a JSON bundle of their chat + memory + a `trace_publication` statement). The `trace.py` publisher and `Kiy-K/fabella-traces` schema are still in the repo for re-deployment (set `FABELLA_SHARE_TRACES=1` to resume Hub publish). |
 | **Field Notes** 📓 | Claimed | Blog/report on what was built and learned, by the maker. |
 | **Off the Grid** 🔌 | Skipped | Drafter, judge, and TTS all run on Modal — a cloud GPU platform, not "in front of you." |
 | **Well-Tuned** 🎯 | Skipped | No fine-tuning; Gemma 4 E4B-IT and Nemotron Nano 4B are used stock, no PEFT/LoRA, no published checkpoint on the Hub. |
@@ -168,8 +170,8 @@ For this demo, the public dataset was removed by the maker. Parents pull their o
   "messages": [{"role": "parent", "content": "...", "age": 7, "tone": "gentle", "created_at": "..."}, ...],
   "memory": {"facts": [...], "summary": "...", "threads": [...], "history_turns": 4},
   "trace_publication": {
-    "dataset": "Kiy-K/fabella-traces",
-    "url": "https://huggingface.co/datasets/Kiy-K/fabella-traces",
     "this_session_max_published_rows": 3,
     "this_session_max_turns": 4,
     "anonymization": [
@@ -182,7 +184,7 @@ For this demo, the public dataset was removed by the maker. Parents pull their o
 }
 ```
-**Re-deploying the public dataset:** the `trace.py` publisher and the `Kiy-K/fabella-traces` schema are still in the repo. Set `FABELLA_SHARE_TRACES=1` on the Space to resume writing rows to that dataset. With the env var unset (or `0`), the publisher is a no-op and rows only live in the per-parent bucket.
 ---
@@ -196,7 +198,7 @@ For this demo, the public dataset was removed by the maker. Parents pull their o
 - `modal_app.py` — Modal deployment (drafter + judge on A10G, VoxCPM2 TTS on L4)
 - `memory.py` — bucket-backed parent memory and preference summaries for follow-up continuity
 - `safety.py` — input sanitization, profanity block, `explain_to_words(tone)`
-- `trace.py` — anonymized ReAct-trace capture and Hub publishing for the [fabella-traces](https://huggingface.co/datasets/Kiy-K/fabella-traces) dataset
 ---

 hf_oauth: true
 license: apache-2.0
 short_description: Small words for big questions.
+datasets:
+  - build-small-hackathon/fabella-traces
 tags:
   - track:backyard
   - sponsor:openbmb
 | Badge | Status | Why |
 |---|---|---|
 | **Off-Brand** 🎨 | Claimed | Custom HTML+CSS+JS frontend served by `gradio.Server` — zero default Gradio chrome. |
+| **Sharing is Caring** 📡 | Claimed | One anonymized row per request lands at [build-small-hackathon/fabella-traces](https://huggingface.co/datasets/build-small-hackathon/fabella-traces) — schema, anonymization, and 5 seed rows in the public card. The Space publishes only when `FABELLA_SHARE_TRACES=1` is set (default `0`); parents can always pull their own data via the **Download my history** button regardless. |
 | **Field Notes** 📓 | Claimed | Blog/report on what was built and learned, by the maker. |
 | **Off the Grid** 🔌 | Skipped | Drafter, judge, and TTS all run on Modal — a cloud GPU platform, not "in front of you." |
 | **Well-Tuned** 🎯 | Skipped | No fine-tuning; Gemma 4 E4B-IT and Nemotron Nano 4B are used stock, no PEFT/LoRA, no published checkpoint on the Hub. |
   "messages": [{"role": "parent", "content": "...", "age": 7, "tone": "gentle", "created_at": "..."}, ...],
   "memory": {"facts": [...], "summary": "...", "threads": [...], "history_turns": 4},
   "trace_publication": {
+    "dataset": "build-small-hackathon/fabella-traces",
+    "url": "https://huggingface.co/datasets/build-small-hackathon/fabella-traces",
     "this_session_max_published_rows": 3,
     "this_session_max_turns": 4,
     "anonymization": [
 }
 ```
+**Re-deploying the public dataset:** the `trace.py` publisher and the `build-small-hackathon/fabella-traces` schema are still in the repo. Set `FABELLA_SHARE_TRACES=1` on the Space to resume writing rows to that dataset. With the env var unset (or `0`), the publisher is a no-op and rows only live in the per-parent bucket.
 ---
 - `modal_app.py` — Modal deployment (drafter + judge on A10G, VoxCPM2 TTS on L4)
 - `memory.py` — bucket-backed parent memory and preference summaries for follow-up continuity
 - `safety.py` — input sanitization, profanity block, `explain_to_words(tone)`
+- `trace.py` — anonymized ReAct-trace capture and Hub publishing for the [fabella-traces](https://huggingface.co/datasets/build-small-hackathon/fabella-traces) dataset
 ---

app.py CHANGED Viewed

@@ -660,8 +660,8 @@ async def api_history_download(request: Request, session_id: str = ""):
         "messages": _public_messages(history.get("messages", [])),
         "memory": memory_layer.public_view(mem),
         "trace_publication": {
-            "dataset": "Kiy-K/fabella-traces",
-            "url": "https://huggingface.co/datasets/Kiy-K/fabella-traces",
             "this_session_max_published_rows": shared_count,
             "this_session_max_turns": turn_count,
             "anonymization": [
@@ -669,6 +669,7 @@ async def api_history_download(request: Request, session_id: str = ""):
                 "Raw situation text is never stored; only its SHA-256 hash, the first 60 chars, and its length are kept.",
                 "Freeform history turns are replaced with role + length counts in the published row.",
                 "The drafter's static system prompt is shipped in full (it's a public string in this repo).",
             ],
         },
     }
@@ -1110,7 +1111,7 @@ a { color: var(--accent-strong); }
     <p style="margin: 0 0 12px; font: 400 13px/1.4 var(--font-sans); color: var(--text-soft);">
       Your chat history and memory are stored in this Space's bucket, keyed to you.
       When you opt in, redacted copies of the drafter/judge trace are published to a
-      <a href="https://huggingface.co/datasets/Kiy-K/fabella-traces" target="_blank" rel="noopener">public dataset</a>
       — they contain no raw situation text, no child name, and no trace that links back to you.
     </p>
     <div style="display:flex; gap: 8px; flex-wrap: wrap;">

         "messages": _public_messages(history.get("messages", [])),
         "memory": memory_layer.public_view(mem),
         "trace_publication": {
+            "dataset": "build-small-hackathon/fabella-traces",
+            "url": "https://huggingface.co/datasets/build-small-hackathon/fabella-traces",
             "this_session_max_published_rows": shared_count,
             "this_session_max_turns": turn_count,
             "anonymization": [
                 "Raw situation text is never stored; only its SHA-256 hash, the first 60 chars, and its length are kept.",
                 "Freeform history turns are replaced with role + length counts in the published row.",
                 "The drafter's static system prompt is shipped in full (it's a public string in this repo).",
+                "Each row is its own data/<trace_id>.json file (race-free across Space replicas).",
             ],
         },
     }
     <p style="margin: 0 0 12px; font: 400 13px/1.4 var(--font-sans); color: var(--text-soft);">
       Your chat history and memory are stored in this Space's bucket, keyed to you.
       When you opt in, redacted copies of the drafter/judge trace are published to a
+      <a href="https://huggingface.co/datasets/build-small-hackathon/fabella-traces" target="_blank" rel="noopener">public dataset</a>
       — they contain no raw situation text, no child name, and no trace that links back to you.
     </p>
     <div style="display:flex; gap: 8px; flex-wrap: wrap;">

trace.py CHANGED Viewed

@@ -88,14 +88,15 @@ log = logging.getLogger("fabella.traces")
 # --- Configuration ---------------------------------------------------------
-# Where the public dataset lives. The default is the user's personal
-# namespace (``Kiy-K/fabella-traces``) because the build-small-hackathon
-# org's tokens are contributor-level and can't create new repos. To
-# publish to the org, an admin must pre-create the dataset and the
-# Space owner must override ``FABELLA_TRACE_REPO`` to the org path.
 DATASET_REPO = os.environ.get(
     "FABELLA_TRACE_REPO",
-    "Kiy-K/fabella-traces",
 )
 # Buffer flush triggers. The background flusher will push whenever EITHER
@@ -111,12 +112,12 @@ FLUSH_INTERVAL_S = float(os.environ.get("FABELLA_TRACE_FLUSH_INTERVAL_S", "300")
 DATASET_DIR = "data"
 PROBE_PATH = f"{DATASET_DIR}/.probe.json"
-# Capture is OFF by default. The public dataset was removed by the maker
-# for this demo; the only path to data is the per-parent "Download my
-# history" self-export button in ``app.py::api_history_download``. To
-# re-enable the public dataset for re-deployment, set
-# ``FABELLA_SHARE_TRACES=1`` on the Space and the publisher will resume
-# writing to ``Kiy-K/fabella-traces``.
 SHARE_TRACES = os.environ.get("FABELLA_SHARE_TRACES", "0").lower() in (
     "1",
     "true",

 # --- Configuration ---------------------------------------------------------
+# Where the public dataset lives. The default is the hackathon org's
+# namespace (``build-small-hackathon/fabella-traces``) so the live Space
+# publishes to the canonical repo attached to the Space's "Datasets" tab.
+# For local dev where the Space's HF_TOKEN cannot create repos in the org
+# namespace, set ``FABELLA_TRACE_REPO=Kiy-K/fabella-traces`` (the maker's
+# personal dataset) to publish to a fallback path.
 DATASET_REPO = os.environ.get(
     "FABELLA_TRACE_REPO",
+    "build-small-hackathon/fabella-traces",
 )
 # Buffer flush triggers. The background flusher will push whenever EITHER
 DATASET_DIR = "data"
 PROBE_PATH = f"{DATASET_DIR}/.probe.json"
+# Capture is OFF by default. The dataset card and schema are live at
+# ``build-small-hackathon/fabella-traces``; the only path to data right
+# now is the per-parent "Download my history" self-export button in
+# ``app.py::api_history_download``. To re-enable publishing for a new
+# deployment, set ``FABELLA_SHARE_TRACES=1`` on the Space and the
+# publisher will resume writing rows to the org dataset.
 SHARE_TRACES = os.environ.get("FABELLA_SHARE_TRACES", "0").lower() in (
     "1",
     "true",