web / README.md
monsimas's picture
Deploy ingestion server + website (app.py, scrub.py, index.html, Dockerfile)
52b30dc verified
|
Raw
History Blame Contribute Delete
2.19 kB
---
title: Trace Commons
emoji: 🧵
colorFrom: indigo
colorTo: gray
sdk: docker
app_port: 7860
pinned: false
short_description: Open commons of anonymized coding-agent traces
---
# Trace Commons — website + ingestion server
This Space serves two things from one origin:
- **`GET /`** — the Trace Commons website (`index.html`).
- **`POST /donate`** — the anonymous ingestion endpoint used by the
[`donate-trace`](https://github.com/trace-commons/donate-trace) skill.
A donation is a single, already-locally-scrubbed coding-agent session. The
server re-runs the **exact same** deterministic scrubber (`scrub.py`, kept
byte-identical to the skill's) as a backstop, refuses anything that still trips
a high-confidence secret pattern, and opens a **pull request** (never a direct
push) to the dataset under a project-owned token. Contributors need no Hugging
Face account.
## Endpoints
| Method | Path | Purpose |
|--------|------------|------------------------------------------------------|
| GET | `/` | The website |
| GET | `/health` | `{configured, dataset}``configured` is true once secrets are set |
| POST | `/donate` | Accept a scrubbed trace, backstop-scrub, open a PR |
## Configuration (Space secrets)
| Secret | Required | Meaning |
|-----------------|----------|-----------------------------------------------------|
| `HF_TOKEN` | yes | Write token for the project bot account |
| `DATASET_REPO` | yes | e.g. `trace-commons/agent-traces` |
| `MAX_BYTES` | no | Max payload size (default 5,000,000) |
| `RATE_PER_HOUR` | no | Donations per IP per hour (default 20) |
If `HF_TOKEN`/`DATASET_REPO` are unset, `/donate` validates and scrubs the
payload but returns `503 validated_not_published` instead of opening a PR.
## Dataset
Donations land in [`trace-commons/agent-traces`](https://huggingface.co/datasets/trace-commons/agent-traces)
under `sessions/<harness>/<filename>`, licensed **CC-BY-4.0**.