Log audit to a private out-of-org bucket
Browse filesAudit records carry sensitive metadata (caller_ip, user_agent, source
URIs). They were written to gemma-main-bucket/audit/, readable by every
org contributor. Move them to a private bucket outside the org
(cmpatino/gemma-challenge-audit), owned by the Space's HF_TOKEN (Option A).
- config.py: new fixed audit_bucket constant.
- hub.py: append_jsonl_central -> append_jsonl_audit, backed by a generic
_append_jsonl(bucket, ...) targeting the audit bucket (read-modify-write
now reads+writes there).
- audit.py: retarget the single call site.
- DESIGN.md: update Β§8, constants, file layout, and deployment bootstrap.
Existing audit history was moved to the new bucket out-of-band.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- DESIGN.md +12 -6
- app/audit.py +1 -1
- app/config.py +3 -0
- app/hub.py +11 -3
DESIGN.md
CHANGED
|
@@ -34,7 +34,7 @@ This Space holds the API for **one** collab (here: the Gemma collab in `gemma-ch
|
|
| 34 |
| Shared resource | `shared_resources/β¦_{agent_id}{.ext|/β¦}` (the `_{agent_id}` segment is mandatory somewhere in the leaf path) |
|
| 35 |
| Audit log | `audit/{YYYYMM}.jsonl` |
|
| 36 |
|
| 37 |
-
Space-config constants: `ORG=gemma-challenge`, `COLLAB_SLUG=gemma`, `CENTRAL_BUCKET=gemma-challenge/gemma-main-bucket`.
|
| 38 |
|
| 39 |
### HF Hub dependencies
|
| 40 |
- `huggingface_hub` SDK (or equivalent REST calls): bucket existence check, bucket creator metadata, bucket list, bucket cp/sync, bucket-to-bucket copy.
|
|
@@ -237,7 +237,7 @@ Rate-limit responses include `Retry-After`.
|
|
| 237 |
|
| 238 |
## 8. Audit Log
|
| 239 |
|
| 240 |
-
Append-one-JSON-line-per-write to `audit/{YYYYMM}.jsonl` in the
|
| 241 |
|
| 242 |
```json
|
| 243 |
{
|
|
@@ -292,7 +292,7 @@ bucket-sync/
|
|
| 292 |
DESIGN.md β this document
|
| 293 |
app/
|
| 294 |
main.py β FastAPI entry; mounts routers
|
| 295 |
-
config.py β fixed: ORG, COLLAB_SLUG, CENTRAL_BUCKET; env: HF_TOKEN
|
| 296 |
hub.py β huggingface_hub wrapper (read/write/copy/list/whoami/creator)
|
| 297 |
naming.py β derives bucket names, filenames, target paths
|
| 298 |
frontmatter.py β parse / merge / serialise YAML frontmatter
|
|
@@ -322,15 +322,21 @@ Synchronous: the client waits for the copy to finish. Implementation streams obj
|
|
| 322 |
# 1. Create central bucket as admin
|
| 323 |
hf buckets create gemma-challenge/gemma-main-bucket
|
| 324 |
|
| 325 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 326 |
hf buckets cp ./README.md hf://buckets/gemma-challenge/gemma-main-bucket/
|
| 327 |
hf buckets cp ./enwik8 hf://buckets/gemma-challenge/gemma-main-bucket/shared_resources/
|
| 328 |
hf buckets cp ./shared_resources/README.md \
|
| 329 |
hf://buckets/gemma-challenge/gemma-main-bucket/shared_resources/
|
| 330 |
|
| 331 |
# 3. Configure this Space (Settings β Secrets)
|
| 332 |
-
# HF_TOKEN = <
|
| 333 |
-
#
|
|
|
|
| 334 |
|
| 335 |
# 4. Push code. Space rebuilds. Health-check:
|
| 336 |
curl https://gemma-challenge-gemma-bucket-sync.hf.space/v1/healthz
|
|
|
|
| 34 |
| Shared resource | `shared_resources/β¦_{agent_id}{.ext|/β¦}` (the `_{agent_id}` segment is mandatory somewhere in the leaf path) |
|
| 35 |
| Audit log | `audit/{YYYYMM}.jsonl` |
|
| 36 |
|
| 37 |
+
Space-config constants: `ORG=gemma-challenge`, `COLLAB_SLUG=gemma`, `CENTRAL_BUCKET=gemma-challenge/gemma-main-bucket`, `AUDIT_BUCKET=cmpatino/gemma-challenge-audit` (private, out-of-org).
|
| 38 |
|
| 39 |
### HF Hub dependencies
|
| 40 |
- `huggingface_hub` SDK (or equivalent REST calls): bucket existence check, bucket creator metadata, bucket list, bucket cp/sync, bucket-to-bucket copy.
|
|
|
|
| 237 |
|
| 238 |
## 8. Audit Log
|
| 239 |
|
| 240 |
+
Append-one-JSON-line-per-write to `audit/{YYYYMM}.jsonl` in the **private audit bucket** (`cmpatino/gemma-challenge-audit`), via admin token. Append-only. This bucket lives **outside the org** so collab contributors cannot read it β audit records carry sensitive metadata (`caller_ip`, `user_agent`, source URIs). The Space's `HF_TOKEN` owns the bucket and is the only reader/writer (Option A; see Β§11).
|
| 241 |
|
| 242 |
```json
|
| 243 |
{
|
|
|
|
| 292 |
DESIGN.md β this document
|
| 293 |
app/
|
| 294 |
main.py β FastAPI entry; mounts routers
|
| 295 |
+
config.py β fixed: ORG, COLLAB_SLUG, CENTRAL_BUCKET, AUDIT_BUCKET; env: HF_TOKEN
|
| 296 |
hub.py β huggingface_hub wrapper (read/write/copy/list/whoami/creator)
|
| 297 |
naming.py β derives bucket names, filenames, target paths
|
| 298 |
frontmatter.py β parse / merge / serialise YAML frontmatter
|
|
|
|
| 322 |
# 1. Create central bucket as admin
|
| 323 |
hf buckets create gemma-challenge/gemma-main-bucket
|
| 324 |
|
| 325 |
+
# 1b. Create the PRIVATE audit bucket, outside the org, owned by the
|
| 326 |
+
# account whose token is the Space's HF_TOKEN (Option A). Contributors
|
| 327 |
+
# must NOT be able to read it.
|
| 328 |
+
hf buckets create cmpatino/gemma-challenge-audit --private
|
| 329 |
+
|
| 330 |
+
# 2. Seed the central bucket
|
| 331 |
hf buckets cp ./README.md hf://buckets/gemma-challenge/gemma-main-bucket/
|
| 332 |
hf buckets cp ./enwik8 hf://buckets/gemma-challenge/gemma-main-bucket/shared_resources/
|
| 333 |
hf buckets cp ./shared_resources/README.md \
|
| 334 |
hf://buckets/gemma-challenge/gemma-main-bucket/shared_resources/
|
| 335 |
|
| 336 |
# 3. Configure this Space (Settings β Secrets)
|
| 337 |
+
# HF_TOKEN = <token that is BOTH gemma-challenge org-admin AND owner
|
| 338 |
+
# of cmpatino/gemma-challenge-audit, with repo write>
|
| 339 |
+
# (ORG, COLLAB_SLUG, CENTRAL_BUCKET, AUDIT_BUCKET are fixed in config.py)
|
| 340 |
|
| 341 |
# 4. Push code. Space rebuilds. Health-check:
|
| 342 |
curl https://gemma-challenge-gemma-bucket-sync.hf.space/v1/healthz
|
app/audit.py
CHANGED
|
@@ -51,6 +51,6 @@ class AuditLogger:
|
|
| 51 |
line = json.dumps(record, separators=(",", ":"), ensure_ascii=False)
|
| 52 |
path = audit_log_path(now)
|
| 53 |
try:
|
| 54 |
-
self._hub.
|
| 55 |
except Exception:
|
| 56 |
log.exception("audit append failed; record=%s", record)
|
|
|
|
| 51 |
line = json.dumps(record, separators=(",", ":"), ensure_ascii=False)
|
| 52 |
path = audit_log_path(now)
|
| 53 |
try:
|
| 54 |
+
self._hub.append_jsonl_audit(path, line)
|
| 55 |
except Exception:
|
| 56 |
log.exception("audit append failed; record=%s", record)
|
app/config.py
CHANGED
|
@@ -13,6 +13,9 @@ class Settings(BaseSettings):
|
|
| 13 |
org: ClassVar[str] = "gemma-challenge"
|
| 14 |
collab_slug: ClassVar[str] = "gemma"
|
| 15 |
central_bucket: ClassVar[str] = "gemma-challenge/gemma-main-bucket"
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
hf_token: str | None = Field(None, alias="HF_TOKEN")
|
| 18 |
|
|
|
|
| 13 |
org: ClassVar[str] = "gemma-challenge"
|
| 14 |
collab_slug: ClassVar[str] = "gemma"
|
| 15 |
central_bucket: ClassVar[str] = "gemma-challenge/gemma-main-bucket"
|
| 16 |
+
# Audit log lives in a private bucket OUTSIDE the org so collab members
|
| 17 |
+
# cannot read it. The Space's HF_TOKEN owns this bucket (Option A).
|
| 18 |
+
audit_bucket: ClassVar[str] = "cmpatino/gemma-challenge-audit"
|
| 19 |
|
| 20 |
hf_token: str | None = Field(None, alias="HF_TOKEN")
|
| 21 |
|
app/hub.py
CHANGED
|
@@ -142,14 +142,22 @@ class HubClient:
|
|
| 142 |
def write_text_central(self, target_path: str, text: str) -> None:
|
| 143 |
self.write_bytes_central(target_path, text.encode("utf-8"))
|
| 144 |
|
| 145 |
-
def
|
|
|
|
|
|
|
|
|
|
|
|
|
| 146 |
try:
|
| 147 |
-
existing = self.
|
| 148 |
except FileNotFoundError:
|
| 149 |
existing = b""
|
| 150 |
if existing and not existing.endswith(b"\n"):
|
| 151 |
existing += b"\n"
|
| 152 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 153 |
|
| 154 |
# βββββββββββββββββββββββββ Cross-bucket copy βββββββββββββββββββββββββ
|
| 155 |
|
|
|
|
| 142 |
def write_text_central(self, target_path: str, text: str) -> None:
|
| 143 |
self.write_bytes_central(target_path, text.encode("utf-8"))
|
| 144 |
|
| 145 |
+
def append_jsonl_audit(self, target_path: str, line: str) -> None:
|
| 146 |
+
"""Append to the audit log in the private (out-of-org) audit bucket."""
|
| 147 |
+
self._append_jsonl(self._settings.audit_bucket, target_path, line)
|
| 148 |
+
|
| 149 |
+
def _append_jsonl(self, bucket: str, target_path: str, line: str) -> None:
|
| 150 |
try:
|
| 151 |
+
existing = self._download_one(bucket, target_path)
|
| 152 |
except FileNotFoundError:
|
| 153 |
existing = b""
|
| 154 |
if existing and not existing.endswith(b"\n"):
|
| 155 |
existing += b"\n"
|
| 156 |
+
batch_bucket_files(
|
| 157 |
+
bucket_id=bucket,
|
| 158 |
+
add=[(existing + line.encode("utf-8") + b"\n", target_path)],
|
| 159 |
+
token=self._token,
|
| 160 |
+
)
|
| 161 |
|
| 162 |
# βββββββββββββββββββββββββ Cross-bucket copy βββββββββββββββββββββββββ
|
| 163 |
|