Spaces:
Running
docs: point all dataset references at build-small-hackathon/fabella-traces
Browse filesThe canonical public dataset is now at the hackathon org namespace,
public, with a real card and 5 seed rows. Update every reference in the
repo:
- README.md YAML: add 'datasets: - build-small-hackathon/fabella-traces'
so the dataset shows up in the Space's Datasets tab.
- README.md body + files list: 4 references moved from Kiy-K to the
build-small-hackathon namespace; the Sharing is Caring badge note now
reads as 'claimed' with a link to the live dataset.
- DATASET.md: default dataset path moved to the org namespace; the
Kiy-K personal-namespace mention is downgraded to a 'local dev
fallback' note.
- app.py: the 'trace_publication' payload the parent sees in their
Download my history bundle now points at the org dataset; the settings
dialog link in INDEX_HTML points at the org dataset; the anonymization
list gains a per-row UUID note.
- trace.py: DATASET_REPO default switched from Kiy-K/fabella-traces to
build-small-hackathon/fabella-traces so the live Space publishes to
the canonical repo on the Space's Datasets tab. The maker's personal
namespace is now a documented FABELLA_TRACE_REPO fallback for local
dev only.
No code behavior change. The per-row UUID publisher, the atomic probe,
the FABELLA_SHARE_TRACES=0 default, the per-parent self-export path,
and the rest of the architecture are unchanged.
|
@@ -1,20 +1,22 @@
|
|
| 1 |
# Fabella Anonymized Agent Traces
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
| 4 |
|
| 5 |
-
This is a local copy of the Hugging Face dataset card for Fabella's anonymized agent-trace publication. The live card is the source of truth; this file is a convenience for reviewers reading the source repo.
|
| 6 |
|
| 7 |
## Why this exists
|
| 8 |
|
| 9 |
Fabella is a small Hugging Face Space that drafts short, kind, age-appropriate explanations a parent can use to talk to their child about a hard thing (a hospitalization, a move, a pet dying). The first version is generated by `google/gemma-4-E4B-it`; a `nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16` judge scores it against a six-criterion rubric; the drafter revises if needed. An optional read-aloud uses `openbmb/VoxCPM2`.
|
| 10 |
|
| 11 |
-
|
| 12 |
|
| 13 |
## Where it lives
|
| 14 |
|
| 15 |
-
- **Live dataset (when enabled):** https://huggingface.co/datasets/
|
| 16 |
-
- **
|
| 17 |
-
- **
|
| 18 |
|
| 19 |
The dataset repo is created on first publish by `trace.py::_ensure_repo` via `HfApi.create_repo(exist_ok=True)`, so a fresh deployment does not require a one-time setup step.
|
| 20 |
|
|
@@ -80,7 +82,7 @@ The download path exists for two reasons: (1) the parent can verify that what we
|
|
| 80 |
|
| 81 |
## Opt-out
|
| 82 |
|
| 83 |
-
- Set `FABELLA_SHARE_TRACES=0` on the Space to kill the publisher (this is the
|
| 84 |
- Or pass `share_trace=False` on `make_explanation` to opt out per request.
|
| 85 |
|
| 86 |
The default for this demo is **off**: rows only live in the per-parent bucket, accessible via the Download button.
|
|
|
|
| 1 |
# Fabella Anonymized Agent Traces
|
| 2 |
|
| 3 |
+
When the public publisher is enabled, one anonymized row per request lands at
|
| 4 |
+
[`build-small-hackathon/fabella-traces`](https://huggingface.co/datasets/build-small-hackathon/fabella-traces).
|
| 5 |
+
The dataset repo is public and contains the live card, the publisher schema, and 5 seed rows for the data viewer. The Space publishes to it only when `FABELLA_SHARE_TRACES=1` is set on the Space (default `0`); with the flag off, rows only live in the per-parent bucket and are exportable via the Space's **Settings β Download my history** button.
|
| 6 |
|
| 7 |
+
This file is a local copy of the Hugging Face dataset card for Fabella's anonymized agent-trace publication. The live card on the Hub is the source of truth; this file is a convenience for reviewers reading the source repo.
|
| 8 |
|
| 9 |
## Why this exists
|
| 10 |
|
| 11 |
Fabella is a small Hugging Face Space that drafts short, kind, age-appropriate explanations a parent can use to talk to their child about a hard thing (a hospitalization, a move, a pet dying). The first version is generated by `google/gemma-4-E4B-it`; a `nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16` judge scores it against a six-criterion rubric; the drafter revises if needed. An optional read-aloud uses `openbmb/VoxCPM2`.
|
| 12 |
|
| 13 |
+
The dataset captures the full ReAct loop for every successful generation, with PII removed before the row leaves the Space. It's the **Sharing is Caring** merit-badge artifact for the [Build Small Hackathon](https://huggingface.co/spaces/build-small-hackathon/README) submission.
|
| 14 |
|
| 15 |
## Where it lives
|
| 16 |
|
| 17 |
+
- **Live dataset (when enabled):** https://huggingface.co/datasets/build-small-hackathon/fabella-traces (default)
|
| 18 |
+
- **Personal-namespace fallback:** set `FABELLA_TRACE_REPO=Kiy-K/fabella-traces` to publish to the maker's personal dataset instead. Used for local dev where the Space's HF_TOKEN can't create in the org namespace.
|
| 19 |
+
- **Default for this demo:** the publisher is OFF (`FABELLA_SHARE_TRACES=0`); data lives in the per-parent bucket and is exportable via the Space's **Settings β Download my history** button.
|
| 20 |
|
| 21 |
The dataset repo is created on first publish by `trace.py::_ensure_repo` via `HfApi.create_repo(exist_ok=True)`, so a fresh deployment does not require a one-time setup step.
|
| 22 |
|
|
|
|
| 82 |
|
| 83 |
## Opt-out
|
| 84 |
|
| 85 |
+
- Set `FABELLA_SHARE_TRACES=0` on the Space to kill the publisher (this is the default).
|
| 86 |
- Or pass `share_trace=False` on `make_explanation` to opt out per request.
|
| 87 |
|
| 88 |
The default for this demo is **off**: rows only live in the per-parent bucket, accessible via the Download button.
|
|
@@ -10,6 +10,8 @@ pinned: true
|
|
| 10 |
hf_oauth: true
|
| 11 |
license: apache-2.0
|
| 12 |
short_description: Small words for big questions.
|
|
|
|
|
|
|
| 13 |
tags:
|
| 14 |
- track:backyard
|
| 15 |
- sponsor:openbmb
|
|
@@ -135,7 +137,7 @@ Three claimed, three skipped. Fabella's honest inventory:
|
|
| 135 |
| Badge | Status | Why |
|
| 136 |
|---|---|---|
|
| 137 |
| **Off-Brand** π¨ | Claimed | Custom HTML+CSS+JS frontend served by `gradio.Server` β zero default Gradio chrome. |
|
| 138 |
-
| **Sharing is Caring** π‘ |
|
| 139 |
| **Field Notes** π | Claimed | Blog/report on what was built and learned, by the maker. |
|
| 140 |
| **Off the Grid** π | Skipped | Drafter, judge, and TTS all run on Modal β a cloud GPU platform, not "in front of you." |
|
| 141 |
| **Well-Tuned** π― | Skipped | No fine-tuning; Gemma 4 E4B-IT and Nemotron Nano 4B are used stock, no PEFT/LoRA, no published checkpoint on the Hub. |
|
|
@@ -168,8 +170,8 @@ For this demo, the public dataset was removed by the maker. Parents pull their o
|
|
| 168 |
"messages": [{"role": "parent", "content": "...", "age": 7, "tone": "gentle", "created_at": "..."}, ...],
|
| 169 |
"memory": {"facts": [...], "summary": "...", "threads": [...], "history_turns": 4},
|
| 170 |
"trace_publication": {
|
| 171 |
-
"dataset": "
|
| 172 |
-
"url": "https://huggingface.co/datasets/
|
| 173 |
"this_session_max_published_rows": 3,
|
| 174 |
"this_session_max_turns": 4,
|
| 175 |
"anonymization": [
|
|
@@ -182,7 +184,7 @@ For this demo, the public dataset was removed by the maker. Parents pull their o
|
|
| 182 |
}
|
| 183 |
```
|
| 184 |
|
| 185 |
-
**Re-deploying the public dataset:** the `trace.py` publisher and the `
|
| 186 |
|
| 187 |
---
|
| 188 |
|
|
@@ -196,7 +198,7 @@ For this demo, the public dataset was removed by the maker. Parents pull their o
|
|
| 196 |
- `modal_app.py` β Modal deployment (drafter + judge on A10G, VoxCPM2 TTS on L4)
|
| 197 |
- `memory.py` β bucket-backed parent memory and preference summaries for follow-up continuity
|
| 198 |
- `safety.py` β input sanitization, profanity block, `explain_to_words(tone)`
|
| 199 |
-
- `trace.py` β anonymized ReAct-trace capture and Hub publishing for the [fabella-traces](https://huggingface.co/datasets/
|
| 200 |
|
| 201 |
---
|
| 202 |
|
|
|
|
| 10 |
hf_oauth: true
|
| 11 |
license: apache-2.0
|
| 12 |
short_description: Small words for big questions.
|
| 13 |
+
datasets:
|
| 14 |
+
- build-small-hackathon/fabella-traces
|
| 15 |
tags:
|
| 16 |
- track:backyard
|
| 17 |
- sponsor:openbmb
|
|
|
|
| 137 |
| Badge | Status | Why |
|
| 138 |
|---|---|---|
|
| 139 |
| **Off-Brand** π¨ | Claimed | Custom HTML+CSS+JS frontend served by `gradio.Server` β zero default Gradio chrome. |
|
| 140 |
+
| **Sharing is Caring** π‘ | Claimed | One anonymized row per request lands at [build-small-hackathon/fabella-traces](https://huggingface.co/datasets/build-small-hackathon/fabella-traces) β schema, anonymization, and 5 seed rows in the public card. The Space publishes only when `FABELLA_SHARE_TRACES=1` is set (default `0`); parents can always pull their own data via the **Download my history** button regardless. |
|
| 141 |
| **Field Notes** π | Claimed | Blog/report on what was built and learned, by the maker. |
|
| 142 |
| **Off the Grid** π | Skipped | Drafter, judge, and TTS all run on Modal β a cloud GPU platform, not "in front of you." |
|
| 143 |
| **Well-Tuned** π― | Skipped | No fine-tuning; Gemma 4 E4B-IT and Nemotron Nano 4B are used stock, no PEFT/LoRA, no published checkpoint on the Hub. |
|
|
|
|
| 170 |
"messages": [{"role": "parent", "content": "...", "age": 7, "tone": "gentle", "created_at": "..."}, ...],
|
| 171 |
"memory": {"facts": [...], "summary": "...", "threads": [...], "history_turns": 4},
|
| 172 |
"trace_publication": {
|
| 173 |
+
"dataset": "build-small-hackathon/fabella-traces",
|
| 174 |
+
"url": "https://huggingface.co/datasets/build-small-hackathon/fabella-traces",
|
| 175 |
"this_session_max_published_rows": 3,
|
| 176 |
"this_session_max_turns": 4,
|
| 177 |
"anonymization": [
|
|
|
|
| 184 |
}
|
| 185 |
```
|
| 186 |
|
| 187 |
+
**Re-deploying the public dataset:** the `trace.py` publisher and the `build-small-hackathon/fabella-traces` schema are still in the repo. Set `FABELLA_SHARE_TRACES=1` on the Space to resume writing rows to that dataset. With the env var unset (or `0`), the publisher is a no-op and rows only live in the per-parent bucket.
|
| 188 |
|
| 189 |
---
|
| 190 |
|
|
|
|
| 198 |
- `modal_app.py` β Modal deployment (drafter + judge on A10G, VoxCPM2 TTS on L4)
|
| 199 |
- `memory.py` β bucket-backed parent memory and preference summaries for follow-up continuity
|
| 200 |
- `safety.py` β input sanitization, profanity block, `explain_to_words(tone)`
|
| 201 |
+
- `trace.py` β anonymized ReAct-trace capture and Hub publishing for the [fabella-traces](https://huggingface.co/datasets/build-small-hackathon/fabella-traces) dataset
|
| 202 |
|
| 203 |
---
|
| 204 |
|
|
@@ -660,8 +660,8 @@ async def api_history_download(request: Request, session_id: str = ""):
|
|
| 660 |
"messages": _public_messages(history.get("messages", [])),
|
| 661 |
"memory": memory_layer.public_view(mem),
|
| 662 |
"trace_publication": {
|
| 663 |
-
"dataset": "
|
| 664 |
-
"url": "https://huggingface.co/datasets/
|
| 665 |
"this_session_max_published_rows": shared_count,
|
| 666 |
"this_session_max_turns": turn_count,
|
| 667 |
"anonymization": [
|
|
@@ -669,6 +669,7 @@ async def api_history_download(request: Request, session_id: str = ""):
|
|
| 669 |
"Raw situation text is never stored; only its SHA-256 hash, the first 60 chars, and its length are kept.",
|
| 670 |
"Freeform history turns are replaced with role + length counts in the published row.",
|
| 671 |
"The drafter's static system prompt is shipped in full (it's a public string in this repo).",
|
|
|
|
| 672 |
],
|
| 673 |
},
|
| 674 |
}
|
|
@@ -1110,7 +1111,7 @@ a { color: var(--accent-strong); }
|
|
| 1110 |
<p style="margin: 0 0 12px; font: 400 13px/1.4 var(--font-sans); color: var(--text-soft);">
|
| 1111 |
Your chat history and memory are stored in this Space's bucket, keyed to you.
|
| 1112 |
When you opt in, redacted copies of the drafter/judge trace are published to a
|
| 1113 |
-
<a href="https://huggingface.co/datasets/
|
| 1114 |
β they contain no raw situation text, no child name, and no trace that links back to you.
|
| 1115 |
</p>
|
| 1116 |
<div style="display:flex; gap: 8px; flex-wrap: wrap;">
|
|
|
|
| 660 |
"messages": _public_messages(history.get("messages", [])),
|
| 661 |
"memory": memory_layer.public_view(mem),
|
| 662 |
"trace_publication": {
|
| 663 |
+
"dataset": "build-small-hackathon/fabella-traces",
|
| 664 |
+
"url": "https://huggingface.co/datasets/build-small-hackathon/fabella-traces",
|
| 665 |
"this_session_max_published_rows": shared_count,
|
| 666 |
"this_session_max_turns": turn_count,
|
| 667 |
"anonymization": [
|
|
|
|
| 669 |
"Raw situation text is never stored; only its SHA-256 hash, the first 60 chars, and its length are kept.",
|
| 670 |
"Freeform history turns are replaced with role + length counts in the published row.",
|
| 671 |
"The drafter's static system prompt is shipped in full (it's a public string in this repo).",
|
| 672 |
+
"Each row is its own data/<trace_id>.json file (race-free across Space replicas).",
|
| 673 |
],
|
| 674 |
},
|
| 675 |
}
|
|
|
|
| 1111 |
<p style="margin: 0 0 12px; font: 400 13px/1.4 var(--font-sans); color: var(--text-soft);">
|
| 1112 |
Your chat history and memory are stored in this Space's bucket, keyed to you.
|
| 1113 |
When you opt in, redacted copies of the drafter/judge trace are published to a
|
| 1114 |
+
<a href="https://huggingface.co/datasets/build-small-hackathon/fabella-traces" target="_blank" rel="noopener">public dataset</a>
|
| 1115 |
β they contain no raw situation text, no child name, and no trace that links back to you.
|
| 1116 |
</p>
|
| 1117 |
<div style="display:flex; gap: 8px; flex-wrap: wrap;">
|
|
@@ -88,14 +88,15 @@ log = logging.getLogger("fabella.traces")
|
|
| 88 |
|
| 89 |
# --- Configuration ---------------------------------------------------------
|
| 90 |
|
| 91 |
-
# Where the public dataset lives. The default is the
|
| 92 |
-
# namespace (``
|
| 93 |
-
#
|
| 94 |
-
#
|
| 95 |
-
#
|
|
|
|
| 96 |
DATASET_REPO = os.environ.get(
|
| 97 |
"FABELLA_TRACE_REPO",
|
| 98 |
-
"
|
| 99 |
)
|
| 100 |
|
| 101 |
# Buffer flush triggers. The background flusher will push whenever EITHER
|
|
@@ -111,12 +112,12 @@ FLUSH_INTERVAL_S = float(os.environ.get("FABELLA_TRACE_FLUSH_INTERVAL_S", "300")
|
|
| 111 |
DATASET_DIR = "data"
|
| 112 |
PROBE_PATH = f"{DATASET_DIR}/.probe.json"
|
| 113 |
|
| 114 |
-
# Capture is OFF by default. The
|
| 115 |
-
#
|
| 116 |
-
# history" self-export button in
|
| 117 |
-
# re-enable
|
| 118 |
-
# ``FABELLA_SHARE_TRACES=1`` on the Space and the
|
| 119 |
-
# writing to
|
| 120 |
SHARE_TRACES = os.environ.get("FABELLA_SHARE_TRACES", "0").lower() in (
|
| 121 |
"1",
|
| 122 |
"true",
|
|
|
|
| 88 |
|
| 89 |
# --- Configuration ---------------------------------------------------------
|
| 90 |
|
| 91 |
+
# Where the public dataset lives. The default is the hackathon org's
|
| 92 |
+
# namespace (``build-small-hackathon/fabella-traces``) so the live Space
|
| 93 |
+
# publishes to the canonical repo attached to the Space's "Datasets" tab.
|
| 94 |
+
# For local dev where the Space's HF_TOKEN cannot create repos in the org
|
| 95 |
+
# namespace, set ``FABELLA_TRACE_REPO=Kiy-K/fabella-traces`` (the maker's
|
| 96 |
+
# personal dataset) to publish to a fallback path.
|
| 97 |
DATASET_REPO = os.environ.get(
|
| 98 |
"FABELLA_TRACE_REPO",
|
| 99 |
+
"build-small-hackathon/fabella-traces",
|
| 100 |
)
|
| 101 |
|
| 102 |
# Buffer flush triggers. The background flusher will push whenever EITHER
|
|
|
|
| 112 |
DATASET_DIR = "data"
|
| 113 |
PROBE_PATH = f"{DATASET_DIR}/.probe.json"
|
| 114 |
|
| 115 |
+
# Capture is OFF by default. The dataset card and schema are live at
|
| 116 |
+
# ``build-small-hackathon/fabella-traces``; the only path to data right
|
| 117 |
+
# now is the per-parent "Download my history" self-export button in
|
| 118 |
+
# ``app.py::api_history_download``. To re-enable publishing for a new
|
| 119 |
+
# deployment, set ``FABELLA_SHARE_TRACES=1`` on the Space and the
|
| 120 |
+
# publisher will resume writing rows to the org dataset.
|
| 121 |
SHARE_TRACES = os.environ.get("FABELLA_SHARE_TRACES", "0").lower() in (
|
| 122 |
"1",
|
| 123 |
"true",
|