kaiju-coder-7 / SOURCE_INVENTORY.md
restokes92's picture
Add files using upload-large-folder tool
9afd28d verified

Kaiju Source Inventory

Generated from GitHub source-of-truth repositories plus the requested local RMDW wiki snapshot. This inventory defines what may become Kaiju training data, what is eval-only, and what must stay excluded.

Global Training Rules

  • Do not train on raw secrets, API keys, OAuth tokens, cookies, private keys, or credential files.
  • Do not train on closed-model responses from OpenAI, Anthropic, Gemini, or similar providers unless the terms clearly allow it.
  • Do not train on client-specific private data without explicit review and consent.
  • Preserve repository name, commit SHA, source path, license, and reviewer status for every promoted dataset row.

GitHub Repository Inventory

Repo SHA Role Training use Required gates Exclusions Notes
RichardEchols/kaiju-coder 3d57eae92ad5 model lab, harness, evals, training scripts candidate-after-review secret-scan, closed-model-output-check, license-review runs, models, .secrets, private datasets, raw logs Use repo-owned harnesses, evals, docs, scripts, and curated datasets. Exclude weights, generated runs, and local secrets.
RichardEchols/Kiyomi-7.7.7 294b31008135 business-owner AI-company module contracts candidate-after-review secret-scan, closed-model-output-check, private-data-review credentials, tokens, private client state, closed-model transcripts Use module contracts, templates, acceptance gates, and owner-facing task structure as high-signal business-owner curriculum.
RichardEchols/kiyomi-agent b192c910f3f7 business OS wrapper and local-agent patterns candidate-after-review secret-scan, closed-model-output-check, private-data-review credentials, tokens, local runtime state, private support logs Use architecture, docs, scripts, and safe wrapper patterns. Do not train on runtime secrets or private logs.
RichardEchols/rmdw-site df089dc3b2d3 public RMDW offer, site, and conversion surface candidate-after-review secret-scan, closed-model-output-check, public-copy-review environment files, deployment secrets, analytics tokens Use public offer copy, app structure, pricing/CTA patterns, and website implementation patterns.
RichardEchols/makotoair 7568f07fea6e client website implementation pattern eval-and-patterns-only secret-scan, client-data-review, consent-review client-specific, contact data, contracts, private business details Use as eval/pattern inspiration for local service business sites. Do not bulk-train on client-specific text without explicit review.
RichardEchols/Mezzal-Construction e8f2eede0405 client website implementation pattern eval-and-patterns-only secret-scan, client-data-review, consent-review client-specific, contact data, contracts, private business details Use as eval/pattern inspiration for premium contractor site work. Do not bulk-train on client-specific text without explicit review.
RichardEchols/rmdw-agent-wiki ae1b8e85d3fe RMDW/Kiyomi operational wiki selective-reference-only secret-scan, credentials-redaction, private-data-review, closed-model-output-check credentials.md, customers.md, raw, contracts, private client notes, support logs Use only redacted strategy/product notes and documented decisions. Never use raw credentials or private client data.

Local Source Inventory

Local files are context snapshots, not the source of truth. Promote local wiki material into training only after explicit review, redaction, and either sync/diff against the GitHub wiki or a documented reviewer exception.

Source Path Git repo Files Training use Required gates Excluded paths present Safe reference candidates Notes
RMDW-Wiki-local /Users/richardecholsai7/Documents/RMDW-Wiki no 93 selective-reference-only secret-scan, credentials-redaction, private-data-review, sync-or-diff-against-github credentials.md, customers.md, customers/, raw/ README.md, kaiju-coder-build-log.md, kaiju-coder-business-plan.md, kaiju-coder-soul.md, kiyomi-agent-build-log.md, pricing-history.md, product/kiyomi-private-ai-workstation.md, ops/product-ops-automation.md, client-acquisition-engine/README.md Use as a local context snapshot only after explicit row-level review. Do not treat unsynced local files as the authoritative training source.

Training Eligibility Meaning

  • candidate-after-review: source can produce training or eval examples only after secret scanning, closed-model-output review, and row-level provenance.
  • eval-and-patterns-only: use for hard eval prompts, harness behavior, screenshots, or generalized patterns. Do not bulk-train on client-specific source text.
  • selective-reference-only: use narrowly after redaction. Treat credentials, customer notes, and raw operational data as excluded by default.
  • Local snapshots require review against the GitHub source of truth before promotion into dataset rows.

Next Dataset Step

Generate candidate examples only from reviewed paths, attach this inventory SHA or local snapshot data to each row, then run scripts/validate_training_data.py before any training run.