File size: 5,585 Bytes
9afd28d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# Kaiju Source Inventory

Generated from GitHub source-of-truth repositories plus the requested local RMDW wiki snapshot. This inventory defines what may become Kaiju training data, what is eval-only, and what must stay excluded.

## Global Training Rules

- Do not train on raw secrets, API keys, OAuth tokens, cookies, private keys, or credential files.
- Do not train on closed-model responses from OpenAI, Anthropic, Gemini, or similar providers unless the terms clearly allow it.
- Do not train on client-specific private data without explicit review and consent.
- Preserve repository name, commit SHA, source path, license, and reviewer status for every promoted dataset row.

## GitHub Repository Inventory

| Repo | SHA | Role | Training use | Required gates | Exclusions | Notes |
|---|---|---|---|---|---|---|
| [RichardEchols/kaiju-coder](https://github.com/RichardEchols/kaiju-coder) | `3d57eae92ad5` | model lab, harness, evals, training scripts | candidate-after-review | secret-scan, closed-model-output-check, license-review | runs, models, .secrets, private datasets, raw logs | Use repo-owned harnesses, evals, docs, scripts, and curated datasets. Exclude weights, generated runs, and local secrets. |
| [RichardEchols/Kiyomi-7.7.7](https://github.com/RichardEchols/Kiyomi-7.7.7) | `294b31008135` | business-owner AI-company module contracts | candidate-after-review | secret-scan, closed-model-output-check, private-data-review | credentials, tokens, private client state, closed-model transcripts | Use module contracts, templates, acceptance gates, and owner-facing task structure as high-signal business-owner curriculum. |
| [RichardEchols/kiyomi-agent](https://github.com/RichardEchols/kiyomi-agent) | `b192c910f3f7` | business OS wrapper and local-agent patterns | candidate-after-review | secret-scan, closed-model-output-check, private-data-review | credentials, tokens, local runtime state, private support logs | Use architecture, docs, scripts, and safe wrapper patterns. Do not train on runtime secrets or private logs. |
| [RichardEchols/rmdw-site](https://github.com/RichardEchols/rmdw-site) | `df089dc3b2d3` | public RMDW offer, site, and conversion surface | candidate-after-review | secret-scan, closed-model-output-check, public-copy-review | environment files, deployment secrets, analytics tokens | Use public offer copy, app structure, pricing/CTA patterns, and website implementation patterns. |
| [RichardEchols/makotoair](https://github.com/RichardEchols/makotoair) | `7568f07fea6e` | client website implementation pattern | eval-and-patterns-only | secret-scan, client-data-review, consent-review | client-specific, contact data, contracts, private business details | Use as eval/pattern inspiration for local service business sites. Do not bulk-train on client-specific text without explicit review. |
| [RichardEchols/Mezzal-Construction](https://github.com/RichardEchols/Mezzal-Construction) | `e8f2eede0405` | client website implementation pattern | eval-and-patterns-only | secret-scan, client-data-review, consent-review | client-specific, contact data, contracts, private business details | Use as eval/pattern inspiration for premium contractor site work. Do not bulk-train on client-specific text without explicit review. |
| [RichardEchols/rmdw-agent-wiki](https://github.com/RichardEchols/rmdw-agent-wiki) | `ae1b8e85d3fe` | RMDW/Kiyomi operational wiki | selective-reference-only | secret-scan, credentials-redaction, private-data-review, closed-model-output-check | credentials.md, customers.md, raw, contracts, private client notes, support logs | Use only redacted strategy/product notes and documented decisions. Never use raw credentials or private client data. |

## Local Source Inventory

Local files are context snapshots, not the source of truth. Promote local wiki material into training only after explicit review, redaction, and either sync/diff against the GitHub wiki or a documented reviewer exception.

| Source | Path | Git repo | Files | Training use | Required gates | Excluded paths present | Safe reference candidates | Notes |
|---|---|---:|---:|---|---|---|---|---|
| RMDW-Wiki-local | `/Users/richardecholsai7/Documents/RMDW-Wiki` | no | 93 | selective-reference-only | secret-scan, credentials-redaction, private-data-review, sync-or-diff-against-github | credentials.md, customers.md, customers/, raw/ | README.md, kaiju-coder-build-log.md, kaiju-coder-business-plan.md, kaiju-coder-soul.md, kiyomi-agent-build-log.md, pricing-history.md, product/kiyomi-private-ai-workstation.md, ops/product-ops-automation.md, client-acquisition-engine/README.md | Use as a local context snapshot only after explicit row-level review. Do not treat unsynced local files as the authoritative training source. |

## Training Eligibility Meaning

- `candidate-after-review`: source can produce training or eval examples only after secret scanning, closed-model-output review, and row-level provenance.
- `eval-and-patterns-only`: use for hard eval prompts, harness behavior, screenshots, or generalized patterns. Do not bulk-train on client-specific source text.
- `selective-reference-only`: use narrowly after redaction. Treat credentials, customer notes, and raw operational data as excluded by default.
- Local snapshots require review against the GitHub source of truth before promotion into dataset rows.

## Next Dataset Step

Generate candidate examples only from reviewed paths, attach this inventory SHA or local snapshot data to each row, then run `scripts/validate_training_data.py` before any training run.