OpenCode commited on
Commit
d2ed801
Β·
1 Parent(s): 54b50db

docs: point all dataset references at build-small-hackathon/fabella-traces

Browse files

The canonical public dataset is now at the hackathon org namespace,
public, with a real card and 5 seed rows. Update every reference in the
repo:

- README.md YAML: add 'datasets: - build-small-hackathon/fabella-traces'
so the dataset shows up in the Space's Datasets tab.
- README.md body + files list: 4 references moved from Kiy-K to the
build-small-hackathon namespace; the Sharing is Caring badge note now
reads as 'claimed' with a link to the live dataset.
- DATASET.md: default dataset path moved to the org namespace; the
Kiy-K personal-namespace mention is downgraded to a 'local dev
fallback' note.
- app.py: the 'trace_publication' payload the parent sees in their
Download my history bundle now points at the org dataset; the settings
dialog link in INDEX_HTML points at the org dataset; the anonymization
list gains a per-row UUID note.
- trace.py: DATASET_REPO default switched from Kiy-K/fabella-traces to
build-small-hackathon/fabella-traces so the live Space publishes to
the canonical repo on the Space's Datasets tab. The maker's personal
namespace is now a documented FABELLA_TRACE_REPO fallback for local
dev only.

No code behavior change. The per-row UUID publisher, the atomic probe,
the FABELLA_SHARE_TRACES=0 default, the per-parent self-export path,
and the rest of the architecture are unchanged.

Files changed (4) hide show
  1. DATASET.md +9 -7
  2. README.md +7 -5
  3. app.py +4 -3
  4. trace.py +13 -12
DATASET.md CHANGED
@@ -1,20 +1,22 @@
1
  # Fabella Anonymized Agent Traces
2
 
3
- **Note:** the public dataset was removed by the maker for this demo. Parents pull their own data at any time via **Settings β†’ Download my history** in the running Space. The publisher (`trace.py`), the schema, and the dataset card on https://huggingface.co/datasets/Kiy-K/fabella-traces are all still in the repo for re-deployment. Set `FABELLA_SHARE_TRACES=1` on the Space to resume Hub publish.
 
 
4
 
5
- This is a local copy of the Hugging Face dataset card for Fabella's anonymized agent-trace publication. The live card is the source of truth; this file is a convenience for reviewers reading the source repo.
6
 
7
  ## Why this exists
8
 
9
  Fabella is a small Hugging Face Space that drafts short, kind, age-appropriate explanations a parent can use to talk to their child about a hard thing (a hospitalization, a move, a pet dying). The first version is generated by `google/gemma-4-E4B-it`; a `nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16` judge scores it against a six-criterion rubric; the drafter revises if needed. An optional read-aloud uses `openbmb/VoxCPM2`.
10
 
11
- When the public publisher is enabled, the dataset captures the full ReAct loop for every successful generation, with PII removed before the row leaves the Space. It's the **Sharing is Caring** merit-badge artifact for the [Build Small Hackathon](https://huggingface.co/spaces/build-small-hackathon/README) submission.
12
 
13
  ## Where it lives
14
 
15
- - **Live dataset (when enabled):** https://huggingface.co/datasets/Kiy-K/fabella-traces (default)
16
- - **Override env var:** `FABELLA_TRACE_REPO=build-small-hackathon/fabella-traces` (only works if an org admin pre-creates the dataset there β€” the Space's HF_TOKEN is contributor-level and cannot create org-namespace repos)
17
- - **Currently:** disabled by default for this demo; data lives in the per-parent bucket and is exportable via the Space's **Settings β†’ Download my history** button.
18
 
19
  The dataset repo is created on first publish by `trace.py::_ensure_repo` via `HfApi.create_repo(exist_ok=True)`, so a fresh deployment does not require a one-time setup step.
20
 
@@ -80,7 +82,7 @@ The download path exists for two reasons: (1) the parent can verify that what we
80
 
81
  ## Opt-out
82
 
83
- - Set `FABELLA_SHARE_TRACES=0` on the Space to kill the publisher (this is the new default).
84
  - Or pass `share_trace=False` on `make_explanation` to opt out per request.
85
 
86
  The default for this demo is **off**: rows only live in the per-parent bucket, accessible via the Download button.
 
1
  # Fabella Anonymized Agent Traces
2
 
3
+ When the public publisher is enabled, one anonymized row per request lands at
4
+ [`build-small-hackathon/fabella-traces`](https://huggingface.co/datasets/build-small-hackathon/fabella-traces).
5
+ The dataset repo is public and contains the live card, the publisher schema, and 5 seed rows for the data viewer. The Space publishes to it only when `FABELLA_SHARE_TRACES=1` is set on the Space (default `0`); with the flag off, rows only live in the per-parent bucket and are exportable via the Space's **Settings β†’ Download my history** button.
6
 
7
+ This file is a local copy of the Hugging Face dataset card for Fabella's anonymized agent-trace publication. The live card on the Hub is the source of truth; this file is a convenience for reviewers reading the source repo.
8
 
9
  ## Why this exists
10
 
11
  Fabella is a small Hugging Face Space that drafts short, kind, age-appropriate explanations a parent can use to talk to their child about a hard thing (a hospitalization, a move, a pet dying). The first version is generated by `google/gemma-4-E4B-it`; a `nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16` judge scores it against a six-criterion rubric; the drafter revises if needed. An optional read-aloud uses `openbmb/VoxCPM2`.
12
 
13
+ The dataset captures the full ReAct loop for every successful generation, with PII removed before the row leaves the Space. It's the **Sharing is Caring** merit-badge artifact for the [Build Small Hackathon](https://huggingface.co/spaces/build-small-hackathon/README) submission.
14
 
15
  ## Where it lives
16
 
17
+ - **Live dataset (when enabled):** https://huggingface.co/datasets/build-small-hackathon/fabella-traces (default)
18
+ - **Personal-namespace fallback:** set `FABELLA_TRACE_REPO=Kiy-K/fabella-traces` to publish to the maker's personal dataset instead. Used for local dev where the Space's HF_TOKEN can't create in the org namespace.
19
+ - **Default for this demo:** the publisher is OFF (`FABELLA_SHARE_TRACES=0`); data lives in the per-parent bucket and is exportable via the Space's **Settings β†’ Download my history** button.
20
 
21
  The dataset repo is created on first publish by `trace.py::_ensure_repo` via `HfApi.create_repo(exist_ok=True)`, so a fresh deployment does not require a one-time setup step.
22
 
 
82
 
83
  ## Opt-out
84
 
85
+ - Set `FABELLA_SHARE_TRACES=0` on the Space to kill the publisher (this is the default).
86
  - Or pass `share_trace=False` on `make_explanation` to opt out per request.
87
 
88
  The default for this demo is **off**: rows only live in the per-parent bucket, accessible via the Download button.
README.md CHANGED
@@ -10,6 +10,8 @@ pinned: true
10
  hf_oauth: true
11
  license: apache-2.0
12
  short_description: Small words for big questions.
 
 
13
  tags:
14
  - track:backyard
15
  - sponsor:openbmb
@@ -135,7 +137,7 @@ Three claimed, three skipped. Fabella's honest inventory:
135
  | Badge | Status | Why |
136
  |---|---|---|
137
  | **Off-Brand** 🎨 | Claimed | Custom HTML+CSS+JS frontend served by `gradio.Server` β€” zero default Gradio chrome. |
138
- | **Sharing is Caring** πŸ“‘ | Re-scoped | For this demo, the public dataset was removed by the maker. Parents pull their own data at any time via the **Download my history** button in the settings dialog (calls `GET /api/history/download` and returns a JSON bundle of their chat + memory + a `trace_publication` statement). The `trace.py` publisher and `Kiy-K/fabella-traces` schema are still in the repo for re-deployment (set `FABELLA_SHARE_TRACES=1` to resume Hub publish). |
139
  | **Field Notes** πŸ““ | Claimed | Blog/report on what was built and learned, by the maker. |
140
  | **Off the Grid** πŸ”Œ | Skipped | Drafter, judge, and TTS all run on Modal β€” a cloud GPU platform, not "in front of you." |
141
  | **Well-Tuned** 🎯 | Skipped | No fine-tuning; Gemma 4 E4B-IT and Nemotron Nano 4B are used stock, no PEFT/LoRA, no published checkpoint on the Hub. |
@@ -168,8 +170,8 @@ For this demo, the public dataset was removed by the maker. Parents pull their o
168
  "messages": [{"role": "parent", "content": "...", "age": 7, "tone": "gentle", "created_at": "..."}, ...],
169
  "memory": {"facts": [...], "summary": "...", "threads": [...], "history_turns": 4},
170
  "trace_publication": {
171
- "dataset": "Kiy-K/fabella-traces",
172
- "url": "https://huggingface.co/datasets/Kiy-K/fabella-traces",
173
  "this_session_max_published_rows": 3,
174
  "this_session_max_turns": 4,
175
  "anonymization": [
@@ -182,7 +184,7 @@ For this demo, the public dataset was removed by the maker. Parents pull their o
182
  }
183
  ```
184
 
185
- **Re-deploying the public dataset:** the `trace.py` publisher and the `Kiy-K/fabella-traces` schema are still in the repo. Set `FABELLA_SHARE_TRACES=1` on the Space to resume writing rows to that dataset. With the env var unset (or `0`), the publisher is a no-op and rows only live in the per-parent bucket.
186
 
187
  ---
188
 
@@ -196,7 +198,7 @@ For this demo, the public dataset was removed by the maker. Parents pull their o
196
  - `modal_app.py` β€” Modal deployment (drafter + judge on A10G, VoxCPM2 TTS on L4)
197
  - `memory.py` β€” bucket-backed parent memory and preference summaries for follow-up continuity
198
  - `safety.py` β€” input sanitization, profanity block, `explain_to_words(tone)`
199
- - `trace.py` β€” anonymized ReAct-trace capture and Hub publishing for the [fabella-traces](https://huggingface.co/datasets/Kiy-K/fabella-traces) dataset
200
 
201
  ---
202
 
 
10
  hf_oauth: true
11
  license: apache-2.0
12
  short_description: Small words for big questions.
13
+ datasets:
14
+ - build-small-hackathon/fabella-traces
15
  tags:
16
  - track:backyard
17
  - sponsor:openbmb
 
137
  | Badge | Status | Why |
138
  |---|---|---|
139
  | **Off-Brand** 🎨 | Claimed | Custom HTML+CSS+JS frontend served by `gradio.Server` β€” zero default Gradio chrome. |
140
+ | **Sharing is Caring** πŸ“‘ | Claimed | One anonymized row per request lands at [build-small-hackathon/fabella-traces](https://huggingface.co/datasets/build-small-hackathon/fabella-traces) β€” schema, anonymization, and 5 seed rows in the public card. The Space publishes only when `FABELLA_SHARE_TRACES=1` is set (default `0`); parents can always pull their own data via the **Download my history** button regardless. |
141
  | **Field Notes** πŸ““ | Claimed | Blog/report on what was built and learned, by the maker. |
142
  | **Off the Grid** πŸ”Œ | Skipped | Drafter, judge, and TTS all run on Modal β€” a cloud GPU platform, not "in front of you." |
143
  | **Well-Tuned** 🎯 | Skipped | No fine-tuning; Gemma 4 E4B-IT and Nemotron Nano 4B are used stock, no PEFT/LoRA, no published checkpoint on the Hub. |
 
170
  "messages": [{"role": "parent", "content": "...", "age": 7, "tone": "gentle", "created_at": "..."}, ...],
171
  "memory": {"facts": [...], "summary": "...", "threads": [...], "history_turns": 4},
172
  "trace_publication": {
173
+ "dataset": "build-small-hackathon/fabella-traces",
174
+ "url": "https://huggingface.co/datasets/build-small-hackathon/fabella-traces",
175
  "this_session_max_published_rows": 3,
176
  "this_session_max_turns": 4,
177
  "anonymization": [
 
184
  }
185
  ```
186
 
187
+ **Re-deploying the public dataset:** the `trace.py` publisher and the `build-small-hackathon/fabella-traces` schema are still in the repo. Set `FABELLA_SHARE_TRACES=1` on the Space to resume writing rows to that dataset. With the env var unset (or `0`), the publisher is a no-op and rows only live in the per-parent bucket.
188
 
189
  ---
190
 
 
198
  - `modal_app.py` β€” Modal deployment (drafter + judge on A10G, VoxCPM2 TTS on L4)
199
  - `memory.py` β€” bucket-backed parent memory and preference summaries for follow-up continuity
200
  - `safety.py` β€” input sanitization, profanity block, `explain_to_words(tone)`
201
+ - `trace.py` β€” anonymized ReAct-trace capture and Hub publishing for the [fabella-traces](https://huggingface.co/datasets/build-small-hackathon/fabella-traces) dataset
202
 
203
  ---
204
 
app.py CHANGED
@@ -660,8 +660,8 @@ async def api_history_download(request: Request, session_id: str = ""):
660
  "messages": _public_messages(history.get("messages", [])),
661
  "memory": memory_layer.public_view(mem),
662
  "trace_publication": {
663
- "dataset": "Kiy-K/fabella-traces",
664
- "url": "https://huggingface.co/datasets/Kiy-K/fabella-traces",
665
  "this_session_max_published_rows": shared_count,
666
  "this_session_max_turns": turn_count,
667
  "anonymization": [
@@ -669,6 +669,7 @@ async def api_history_download(request: Request, session_id: str = ""):
669
  "Raw situation text is never stored; only its SHA-256 hash, the first 60 chars, and its length are kept.",
670
  "Freeform history turns are replaced with role + length counts in the published row.",
671
  "The drafter's static system prompt is shipped in full (it's a public string in this repo).",
 
672
  ],
673
  },
674
  }
@@ -1110,7 +1111,7 @@ a { color: var(--accent-strong); }
1110
  <p style="margin: 0 0 12px; font: 400 13px/1.4 var(--font-sans); color: var(--text-soft);">
1111
  Your chat history and memory are stored in this Space's bucket, keyed to you.
1112
  When you opt in, redacted copies of the drafter/judge trace are published to a
1113
- <a href="https://huggingface.co/datasets/Kiy-K/fabella-traces" target="_blank" rel="noopener">public dataset</a>
1114
  β€” they contain no raw situation text, no child name, and no trace that links back to you.
1115
  </p>
1116
  <div style="display:flex; gap: 8px; flex-wrap: wrap;">
 
660
  "messages": _public_messages(history.get("messages", [])),
661
  "memory": memory_layer.public_view(mem),
662
  "trace_publication": {
663
+ "dataset": "build-small-hackathon/fabella-traces",
664
+ "url": "https://huggingface.co/datasets/build-small-hackathon/fabella-traces",
665
  "this_session_max_published_rows": shared_count,
666
  "this_session_max_turns": turn_count,
667
  "anonymization": [
 
669
  "Raw situation text is never stored; only its SHA-256 hash, the first 60 chars, and its length are kept.",
670
  "Freeform history turns are replaced with role + length counts in the published row.",
671
  "The drafter's static system prompt is shipped in full (it's a public string in this repo).",
672
+ "Each row is its own data/<trace_id>.json file (race-free across Space replicas).",
673
  ],
674
  },
675
  }
 
1111
  <p style="margin: 0 0 12px; font: 400 13px/1.4 var(--font-sans); color: var(--text-soft);">
1112
  Your chat history and memory are stored in this Space's bucket, keyed to you.
1113
  When you opt in, redacted copies of the drafter/judge trace are published to a
1114
+ <a href="https://huggingface.co/datasets/build-small-hackathon/fabella-traces" target="_blank" rel="noopener">public dataset</a>
1115
  β€” they contain no raw situation text, no child name, and no trace that links back to you.
1116
  </p>
1117
  <div style="display:flex; gap: 8px; flex-wrap: wrap;">
trace.py CHANGED
@@ -88,14 +88,15 @@ log = logging.getLogger("fabella.traces")
88
 
89
  # --- Configuration ---------------------------------------------------------
90
 
91
- # Where the public dataset lives. The default is the user's personal
92
- # namespace (``Kiy-K/fabella-traces``) because the build-small-hackathon
93
- # org's tokens are contributor-level and can't create new repos. To
94
- # publish to the org, an admin must pre-create the dataset and the
95
- # Space owner must override ``FABELLA_TRACE_REPO`` to the org path.
 
96
  DATASET_REPO = os.environ.get(
97
  "FABELLA_TRACE_REPO",
98
- "Kiy-K/fabella-traces",
99
  )
100
 
101
  # Buffer flush triggers. The background flusher will push whenever EITHER
@@ -111,12 +112,12 @@ FLUSH_INTERVAL_S = float(os.environ.get("FABELLA_TRACE_FLUSH_INTERVAL_S", "300")
111
  DATASET_DIR = "data"
112
  PROBE_PATH = f"{DATASET_DIR}/.probe.json"
113
 
114
- # Capture is OFF by default. The public dataset was removed by the maker
115
- # for this demo; the only path to data is the per-parent "Download my
116
- # history" self-export button in ``app.py::api_history_download``. To
117
- # re-enable the public dataset for re-deployment, set
118
- # ``FABELLA_SHARE_TRACES=1`` on the Space and the publisher will resume
119
- # writing to ``Kiy-K/fabella-traces``.
120
  SHARE_TRACES = os.environ.get("FABELLA_SHARE_TRACES", "0").lower() in (
121
  "1",
122
  "true",
 
88
 
89
  # --- Configuration ---------------------------------------------------------
90
 
91
+ # Where the public dataset lives. The default is the hackathon org's
92
+ # namespace (``build-small-hackathon/fabella-traces``) so the live Space
93
+ # publishes to the canonical repo attached to the Space's "Datasets" tab.
94
+ # For local dev where the Space's HF_TOKEN cannot create repos in the org
95
+ # namespace, set ``FABELLA_TRACE_REPO=Kiy-K/fabella-traces`` (the maker's
96
+ # personal dataset) to publish to a fallback path.
97
  DATASET_REPO = os.environ.get(
98
  "FABELLA_TRACE_REPO",
99
+ "build-small-hackathon/fabella-traces",
100
  )
101
 
102
  # Buffer flush triggers. The background flusher will push whenever EITHER
 
112
  DATASET_DIR = "data"
113
  PROBE_PATH = f"{DATASET_DIR}/.probe.json"
114
 
115
+ # Capture is OFF by default. The dataset card and schema are live at
116
+ # ``build-small-hackathon/fabella-traces``; the only path to data right
117
+ # now is the per-parent "Download my history" self-export button in
118
+ # ``app.py::api_history_download``. To re-enable publishing for a new
119
+ # deployment, set ``FABELLA_SHARE_TRACES=1`` on the Space and the
120
+ # publisher will resume writing rows to the org dataset.
121
  SHARE_TRACES = os.environ.get("FABELLA_SHARE_TRACES", "0").lower() in (
122
  "1",
123
  "true",