Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
frumu 
posted an update about 10 hours ago
Post
87
Prototype/proposal repo: https://github.com/frumu-ai/trace-share

Goal: an opt-in Rust CLI that ingests local coding-agent logs (Codex/Claude/VS Code agents), scrubs secrets/PII locally (gitleaks + deterministic redaction), and exports structured “episodes” intended for OSS model training (SFT + tool-use traces).

Status: local-only right now (uploads nothing). Missing pieces are:

a home to publish versioned dataset snapshots (JSONL/Parquet + manifests/checksums), and an optional vector index for search/dedupe/curation.

Hugging Face is the most natural distribution channel because the end product is a Hub Dataset (versioned downloadable snapshots). I’m looking for direct Hugging Face support (recommended dataset layout + publishing workflow, and ideally storage/bandwidth support as releases scale).

Scale ref: ~700MB raw Codex logs → ~36MB sanitized export.
In this post