Post
87
Prototype/proposal repo: https://github.com/frumu-ai/trace-share
Goal: an opt-in Rust CLI that ingests local coding-agent logs (Codex/Claude/VS Code agents), scrubs secrets/PII locally (gitleaks + deterministic redaction), and exports structured “episodes” intended for OSS model training (SFT + tool-use traces).
Status: local-only right now (uploads nothing). Missing pieces are:
a home to publish versioned dataset snapshots (JSONL/Parquet + manifests/checksums), and an optional vector index for search/dedupe/curation.
Hugging Face is the most natural distribution channel because the end product is a Hub Dataset (versioned downloadable snapshots). I’m looking for direct Hugging Face support (recommended dataset layout + publishing workflow, and ideally storage/bandwidth support as releases scale).
Scale ref: ~700MB raw Codex logs → ~36MB sanitized export.
Goal: an opt-in Rust CLI that ingests local coding-agent logs (Codex/Claude/VS Code agents), scrubs secrets/PII locally (gitleaks + deterministic redaction), and exports structured “episodes” intended for OSS model training (SFT + tool-use traces).
Status: local-only right now (uploads nothing). Missing pieces are:
a home to publish versioned dataset snapshots (JSONL/Parquet + manifests/checksums), and an optional vector index for search/dedupe/curation.
Hugging Face is the most natural distribution channel because the end product is a Hub Dataset (versioned downloadable snapshots). I’m looking for direct Hugging Face support (recommended dataset layout + publishing workflow, and ideally storage/bandwidth support as releases scale).
Scale ref: ~700MB raw Codex logs → ~36MB sanitized export.