Governing Self-Modification - A Charter for the Pattern-Learning Bridge

Community Article Published December 31, 2025

Draft v0.1 — Non-normative supplement to PLB / PLB-M / SI-Core / SI-NOS

This document is non-normative. It explains how to govern a Pattern-Learning Bridge (PLB) that can propose changes to its own system — sometimes including code.

Normative guarantees live in the SI-Core spec, SI-NOS design, PLB base design, and the constitutional / ethics documents.

1. What we are actually governing

This is not about “an AI that wakes up and rewrites everything overnight”.

It’s about something much more boring — and therefore much more dangerous:

A structured process that mines patterns from incidents and telemetry, proposes changes to the system (policies, thresholds, code), and sometimes auto-applies those changes.

In earlier docs:

PLB = Pattern-Learning Bridge
- mines structural patterns from [OBS]/[MEM]/EthicsTrace,
- proposes patches to:
  - SIL functions,
  - ethics / risk policies,
  - saga compensators,
  - semantic compression configs,
  - GCS estimators.
PLB-M = market + negotiation + learning
- multiple agents negotiate Goal Trade Proposals (GTPs),
- PLB learns better trade patterns.

This charter answers four questions:

Scope — what is PLB allowed to touch?
Budget — how much change is allowed, and how fast?
Metrics — how do we know PLB is helping, not harming?
Red lines — what remains strictly human / constitutional?

Think of it as:

“If you’re going to let a system help rewrite itself, this is the minimum structure you owe yourself.”

2. Threat model for self-modification

We assume PLB is not “evil”; it’s just a powerful optimiser with imperfect models.

Things that can go wrong:

Over-fitting to recent pain
- A cluster of recent incidents pushes PLB to propose:
  - over-conservative policies,
  - huge safety margins,
  - or too-aggressive compression.
- The system becomes sluggish or over-cautious in the long run.
Hidden goal drift
- Small patches accumulate.
- Over months, effective behaviour shifts from:
  - “minimize flood risk under fairness constraints” to
  - “minimize any legal risk, ignore service quality”.
- Nobody can point to a single “bad patch”; drift came from many tiny changes.
Governance bypass
- PLB manages to:
  - patch modules outside its allowed zone,
  - reduce the strictness of its own checks,
  - weaken logging around its changes.
Coupled PLBs
- Multiple PLBs (per domain / per org) learn off each other’s behaviour.
- A bad “pattern” propagates through PLB-M markets into several systems at once.
Human over-trust
- “It passed sandbox, PLB proposed it, metrics look okay — must be fine.”
- Humans stop asking hard questions.

The charter is a response to this threat model: assume good intentions, prevent structural misbehaviour.

3. Self-Modification Budgets

We need a concept more granular than “allowed / forbidden”:

Self-modification budget = the envelope of change a PLB is allowed to make, without triggering higher-level governance.

You can think of three dimensions:

Scope budget — which artifacts can be changed?

Example buckets:
- local heuristics,
- thresholds / weights,
- compensator parameters,
- SIL functions,
- ethics policies,
- constitutional norms.
Magnitude budget — how big can a change be?

Examples:
- at most Δ≤X% change in a parameter,
- at most N lines changed in a SIL function,
- at most K% change in a GCS floor,
- at most M% shift in semantic compression ε-budget.
Rate budget — how often and how quickly can changes happen?

Examples:
- no more than R auto-applied patches per day,
- cool-down periods after rollbacks,
- per-goal change rate limits.

A non-normative configuration sketch:

# non-normative sketch (implementation-friendly)
# NOTE:
# - "rate" contains frequency/window controls.
# - Zone C is "suggest-only" by policy; avoid "inf" sentinels.

plb_self_mod_budget:
  zone_A:  # local heuristics, low-risk (auto-apply allowed)
    scope:
      artifacts:
        - heuristic_params
        - noncritical_saga_timeouts
    magnitude:
      max_param_delta_pct: 10.0
      max_lines_changed: 20
    rate:
      max_patches_per_week: 50
      min_hours_between_patches: 0.1
    requires:
      sandbox_validation: true
      canary_deploy: false
      human_signoff: false

  zone_B:  # shared operational policies (bounded; human+eval required)
    scope:
      artifacts:
        - risk_policies
        - compression_profiles
        - saga_compensators
        - failure_classifiers
    magnitude:
      max_param_delta_pct: 5.0
      max_lines_changed: 50
    rate:
      max_patches_per_week: 10
      min_hours_between_patches: 6
    requires:
      sandbox_validation: true
      canary_deploy: true
      human_signoff: true

  zone_B_plus:  # goal-native algorithms (shadow-only auto-apply)
    scope:
      artifacts:
        - gcs_estimators
        - multi_goal_schedulers
        - fairness_constraints
    magnitude:
      max_param_delta_pct: 2.0
      max_lines_changed: 30
    rate:
      max_patches_per_week: 5
      min_hours_between_patches: 24
    requires:
      shadow_only_auto_apply: true
      sandbox_validation: true
      human_signoff: true
      governance_review_for_prod: true

  zone_C:  # ethics + constitution (suggest-only; human process)
    scope:
      artifacts:
        - eth_policies
        - constitutional_norms
    magnitude:
      max_param_delta_pct: 0.0
      max_lines_changed: 0
    rate:
      auto_apply: false
    requires:
      plb_may_suggest_only: true
      human_assembly: true
      constitutional_process: true

Informally:

Zone A — PLB can auto-patch within budget.
Zone B — PLB can suggest; auto-apply only with human + sandbox.
Zone C — PLB can only point at problems, never apply patches.

3.1 Budget violation detection and response (non-normative)

Self-modification budgets only matter if violations are visible and trigger predictable responses.

Important authority rule

PLB may detect and report budget violations.
A separate governance-owned controller (or human operators) is the only entity allowed to:
- change budgets,
- change PLB mode (normal | suggest_only | paused),
- promote/demote artifacts across zones.

Detection mechanisms:

Pre-apply checks
Each patch is checked against:
- scope budget (allowed artifact types),
- magnitude budget (Δ bounds, line-change bounds),
- rate budget (per-window limits).
Cumulative tracking
Maintain rolling counters per zone:
- total Δ applied in a sliding window,
- patches per goal / per artifact class,
- rollback density (for cooldown decisions).
Cross-PLB cross-check
When multiple PLBs operate on overlapping artifacts, a coordinator outside PLB validates that the combined change remains within budgets.

Violation types and responses:

Scope violation
- Detection: patch targets an artifact type outside the configured zone.
- Response:
  - hard reject (no partial application),
  - escalate to governance review,
  - record the attempt in [MEM]/audit.
Magnitude violation
- Detection: proposed Δ exceeds bounds.
- Response:
  - Zone A only: optional clamp-to-budget (e.g. 15% → 10%), with the original proposal preserved in audit.
  - Zone B / B+ / C: reject and require a revised proposal (no automatic clamping).
- Rationale: clamping in higher zones can become an unintended bypass of sign-off.
Rate violation
- Detection: too many patches in a time window or insufficient cooldown since last rollback.
- Response:
  - queue non-urgent patches,
  - enforce cooldown,
  - governance controller may downgrade the zone’s mode to suggest_only.

Progressive sanctions on repeated violations (executed by governance controller):

record-only (metrics + alerts),
temporary budget tightening for that PLB and zone,
downgrade to suggest_only,
full pause of PLB auto-apply (paused).

Automatic recovery (governance-controlled):

After a configured “clean window” with no violations and stable meta-metrics:
- budgets may be restored gradually, or
- restored immediately with explicit human approval.

4. Layers of code and who may change them

We need a change ladder.

Terminology note (to avoid zone drift)

Zones in the budget config are: A / B / B+ / C.
“C++” is a rhetorical label for the constitution subset of Zone C (it is not a separate budget zone).

From bottom to top:

Execution heuristics (Zone A)
- Local retry counts, decay rates, thresholds.
- Local semantic compression knobs for low-stakes streams.
- Goal: “Reduce incident frequency / noise without touching core logic.”
Allowed:
- PLB may propose + auto-apply within strict budget.
- Rollbacks cheap (RML-2), impact local.
Operational policies & compensators (Zone B)
- Risk profiles, saga compensator logic, failure classification.
- Goal: “Change how we respond to incidents, not what we fundamentally optimise for.”
Allowed:
- PLB may propose.
- Auto-apply only after: schema/tests → sandbox → human sign-off → canary.
Goal-native algorithms (Zone B+)
- GCS estimators, multi-goal schedulers, fairness constraints.
- Goal: “Refine how we measure and balance goals, within fixed constitutional commitments.”
Allowed:
- PLB may propose improvements.
- Auto-apply is permitted only in shadow mode (no live external effects).
- Promotion to production requires governance review + human sign-off.
Ethics policies (Zone C — ethics subset)
- Non-discrimination, safety bounds, consent requirements.
- Goal: “Express human / societal values in machine-checkable form.”
Allowed:
- PLB may surface anomalies and generate drafts for human consideration.
- PLB may not commit changes or weaken constraints.
Constitutional norms (Zone C — constitution subset / “C++”)
- Fundamental rights, jurisdictional rules, cross-organization commitments.
Allowed:
- PLB may flag conflicts / blind spots and propose problem statements.
- Any concrete patch is human-only via constitutional process.

Short version:

The higher you go, the more PLB becomes a diagnostic tool, not an actor.

5. Meta-metrics for PLB itself

A “smart” system without meta-metrics is just a very opaque error-generator.

PLB should be treated as its own governed subsystem, with health metrics such as:

Proposal Adoption Rate (PAR)
- fraction of proposals that:
  - pass validation,
  - get deployed (even canary),
  - remain in place after observation window.
Very low PAR → PLB is noise / overfitting. Very high PAR + frequent incidents → governance is asleep.
Rollback Rate (RR)
- fraction of PLB-induced patches that are reverted:
  - due to regressions,
  - due to side-effects,
  - due to governance concerns.
Sandbox–Production Agreement (SPA)
- for PLB-induced changes validated in sandbox:
  - how often does production behaviour match sandbox predictions?
- low SPA → sandbox / test coverage is inadequate.
Incident Resolution Delta (IRD)
- across similar incident classes, before vs after PLB patches:
  - time-to-resolution,
  - user impact,
  - recurrence rate.
If IRD is negative (things worse after PLB), that’s a red flag.
Drift & Bias Indicators
- does PLB:
  - systematically propose stricter policies in some regions / cohorts?
  - move GCS trade-offs in a particular direction?
- tracked via fairness / distributional metrics.

A very rough sketch:

{
  "plb_id": "PLB-city-core",
  "window": "2028-04",
  "proposal_count": 120,
  "proposal_adoption_rate": 0.42,
  "rollback_rate": 0.06,
  "sandbox_prod_agreement": 0.91,
  "incident_resolution_delta": {
    "flood_risk": -0.23,
    "traffic_routing": -0.08
  },
  "drift_signals": {
    "gcs_safety_bias": "+0.03",
    "gcs_efficiency_bias": "-0.02"
  }
}

If PLB cannot keep these numbers within agreed bands, it becomes the subject of governance — up to and including pausing it.

6. The PLB governance pipeline

Every PLB-driven change should follow a structured pipeline. No “mysterious hotfixes”.

Sensing
- PLB monitors:
  - incidents,
  - near-misses,
  - structural anomalies,
  - metric drifts.
Pattern Mining
- clustering similar incidents,
- extracting structural conditions,
- inferring “candidate explanations”.
Draft Proposal
- produce a structured patch proposal:
  - patch type (policy, code, config),
  - target artifact,
  - expected effect on metrics / GCS,
  - safety constraints.
Validation Stack

At minimum:
- schema & type checks,
- conformance tests (existing test suite),
- sandbox replay on historical incidents,
- golden-diff on SIR / logs,
- [ETH] / constitutional compatibility.
Governance Decision

Based on:
- affected zone (A / B / C),
- risk profile,
- magnitude budget.
Possible outcomes:
- AUTO-APPLY (Zone A, within budget)
- CANARY WITH HUMAN SIGN-OFF (Zone B)
- RESEARCH-ONLY / REJECT (does not meet constraints)
- ESCALATE TO BOARD / CONSTITUTIONAL PROCESS (Zone C impact)
Deployment & Monitoring
- canary rollout,
- metric watch (including PLB meta-metrics),
- automatic rollback triggers.
Retrospective
- did the change achieve its claimed benefits?
- were side-effects acceptable?
- does PLB need retuning (thresholds, budgets)?

Every step is itself observable: PLB’s pipeline is a first-class citizen of [OBS]/[ID]/[ETH]/[MEM]/[EVAL].

6.1 Multi-PLB coordination

In realistic deployments, more than one PLB may operate:

per domain (traffic, flood, grid),
per organization (city vs operator),
per layer (local vs global PLB-M).

Coordination is needed when:

multiple PLBs propose changes to the same artifact,
proposals have cross-dependencies,
changes from one PLB materially affect the patterns another PLB sees.

Coordination mechanisms:

Patch ordering
- Priority by intent:
  - safety / compliance > reliability > efficiency.
- Secondary order:
  - timestamp (first-come),
  - organizational authority level,
  - risk profile of the target artifact.
Conflict resolution
- Detect structurally conflicting patches:
  - overlapping lines / keys,
  - incompatible parameter ranges.
- For non-conflicting patches:
  - merge automatically, then test.
- For conflicting patches:
  - route to human arbitration with:
    - both PLB rationales,
    - sandbox results for each patch alone and combined.
Impact analysis
- Before applying a batch:
  - run sandbox simulations with different subsets,
  - compute expected impact on key metrics and goals,
  - track cross-PLB interactions over time.
Cooperative patterns
- Shared pattern libraries:
  - once one PLB discovers a robust pattern, others can reuse it under their own governance.
- Shared failure patterns:
  - when a PLB-induced change is rolled back, other PLBs are notified to avoid similar proposals.

A sketch of a coordination loop:

def coordinate_patches(plb_proposals):
    conflicts = detect_conflicts(plb_proposals)
    if conflicts:
        return human_arbitration(conflicts)

    ordered = prioritize(plb_proposals)
    applied = []
    for patch in ordered:
        if sandbox_test(patch, context=applied):
            governance_controller.apply(patch)  # controller owns budget/mode enforcement + audit logging
            applied.append(patch)
        else:
            defer_for_review(patch)

7. Red lines and structural constraints

There are things PLB must never be allowed to do, no matter how clever it gets.

At minimum:

No patching of constitutional / ETH core
- PLB cannot:
  - modify constitutional norms,
  - change fundamental ethics rules,
  - weaken hard floors (e.g. safety GCS floors).
It may only:
- annotate,
- raise anomalies,
- suggest problem statements.
No self-budget editing
- PLB cannot change:
  - its own plb_self_mod_budget,
  - its own meta-metric thresholds,
  - its own evaluation pipeline.
Those are governance-owned.
No unlogged changes
- every PLB proposal and applied patch must:
  - be a jump with [ID]/[OBS]/[ETH]/[MEM],
  - include full diffs and metrics,
  - be auditable retroactively.
No bypass of rollback infrastructure
- PLB must:
  - register each effect with RML-2/3 properly,
  - provide compensators for its own changes,
  - not mark anything as “irreversible” unless governance explicitly agreed.
No privilege escalation
- PLB cannot:
  - gain write access to new artifacts without governance update,
  - change its own deployment boundaries,
  - push patches directly to infra / hardware.

Think of PLB as a very smart junior engineer:

It can debug, propose, even auto-patch small things — but it does not own the repo, the production cluster, or the constitution.

7.1 Adversarial patch detection

Even if PLB itself is designed as a good-faith optimiser, its inputs and environment may not be. We need to detect adversarial influence on PLB proposals.

Threat vectors:

Data poisoning
- Incident logs / telemetry manipulated to induce misleading patterns.
Trojan patches
- Patches that look reasonable but embed harmful logic or backdoors.
Gradual privilege escalation
- Sequence of small, seemingly harmless changes that effectively expand PLB’s power.
Metric / log manipulation
- Attempts to hide regressions or make bad patches look successful.

Detection strategies:

Anomaly detection
- Statistical outlier detection on:
  - patch magnitude and scope,
  - targeted artifacts,
  - expected vs observed metric impact.
- Flag proposals that differ qualitatively from PLB’s historical behaviour.
Formal / structural checks
- Enforce invariants:
  - type safety,
  - security properties,
  - constitutional constraints.
- Reject any patch that would weaken these, regardless of apparent benefit.
Redundant validation
- Independent sandboxes:
  - validate patches in separately maintained environments.
- Cross-PLB comparison:
  - if a patch cannot be reproduced by a second PLB on the same data, treat as suspicious.
- Human sampling:
  - random subset of patches receive deep human review, regardless of risk score.
Canary deployment
- Always roll out risky patches to small, well-instrumented slices:
  - monitor for anomalous behaviour,
  - automatic rollback on threshold breaches.

Response:

Isolate suspicious patches:
- do not auto-apply,
- add to a review queue.
Audit related proposals:
- identify whether a series of patches shares a suspicious pattern.
Inspect PLB training / data:
- look for poisoning or systemic bias.
If needed:
- retrain or reconfigure PLB,
- tighten self-modification budgets,
- temporarily move to suggest-only mode.

8. Handling PLB failures & emergency stop

If PLB itself misbehaves, we need a big red button.

Pause switch
- plb_mode: {normal | suggest_only | paused}
- when paused:
  - PLB may still analyze and log,
  - but no patches are auto-applied,
  - humans can still read its suggestions.
Global rollback of PLB-induced changes
- maintain a list of:
  - patches tagged origin: plb,
  - their causal effects.
- allow “rollback PLB window” commands:
  - revert all PLB-induced changes from last N hours/days,
  - via RML-2/3 compensators.
PLB Self-Audit
- when metrics show:
  - high rollback rate,
  - negative IRD,
  - or suspicious drift,
- trigger a PLB self-audit run:
  - freeze new proposals,
  - re-analyze proposal history,
  - produce a “what went wrong” report.
Governance escalation
- if PLB shows repeated problematic behaviour:
  - governance can:
    - shrink its budgets,
    - move more artifacts into Zone C,
    - or decommission PLB entirely.

9. Relationship to human governance & constitution

It’s tempting to say:

“If PLB is good enough, humans can get out of the loop.”

This charter assumes the opposite:

Humans own goals and constitutions.
PLB is tooling for:
- pattern detection,
- hypothesis generation,
- and small, reversible improvements.

Over time, a healthy posture looks like:

PLB automates:
- local tuning,
- low-risk policy tweaks,
- boilerplate patch generation.
Humans and governance bodies focus on:
- conflicts between stakeholders,
- long-term goals and norms,
- cross-domain and cross-organization impacts.

Or in one line:

PLB accelerates learning, the constitution decides what “better” means.

9.1 Human override protocols

Humans remain the ultimate governors of PLB behaviour. The system should make it easy and traceable for humans to intervene.

Override types:

Emergency stop
- Who: on-call ops, security, or designated leadership roles.
- Trigger: major incident, suspected security compromise, constitutional concern.
- Effect: immediate switch of PLB to paused (no auto-apply).
- Logging:
  - [ID]: origin = human_override/emergency_stop
  - [MEM]: full context, including reason and author.
Patch rejection
- Who: domain experts, ethics / governance boards.
- Reason: technical risk, ethical concern, policy conflict.
- Effect:
  - specific patch is permanently blocked,
  - PLB receives structured feedback (reason code / rationale).
- Benefit: PLB can learn “classes of unacceptable proposals”.
Budget adjustment
- Who: governance committee or equivalent.
- Reason: metrics degradation, strategic change, regulatory updates.
- Effect:
  - zones can be reclassified,
  - budgets tightened or relaxed,
  - changes themselves logged and subject to review.
Human-initiated patch
- Who: security team or ops in emergencies.
- Condition: cannot wait for PLB proposal (e.g. critical vulnerability).
- Requirements:
  - patch labelled as origin: human_override,
  - subject to post-hoc review and audit,
  - considered when recalibrating PLB suggestions.

Interfaces:

CLI:
- plb pause --reason "incident-042"
- plb budget set zone_B.max_param_delta_pct 2.5
Dashboard:
- one-click pause / downgrade to suggest-only,
- per-zone budget sliders with confirmation.
API:
- programmatic control for integration with incident tooling.

All overrides pass through the same [OBS]/[ID]/[MEM] machinery as PLB proposals. These commands act via the governance-owned controller (not direct PLB self-edit).

10. Adoption roadmap

How do you gradually give PLB more power without jumping off a cliff?

A pragmatic sequence:

Phase 0 — Read-Only Advisor
- PLB runs in suggest-only mode.
- Humans manually implement any accepted insights.
- Metrics: PLB usefulness, not safety.
Phase 1 — Auto-patch in Zone A
- Allow PLB to auto-tune low-risk heuristics.
- Strict budgets + fast rollbacks.
- Monitor PLB meta-metrics.
Phase 2 — Co-pilot in Zone B
- PLB proposes policy / GCS / compression patches.
- Human + sandbox gatekeeper.
- Partial automation of validation pipeline.
Phase 3 — PLB-M integration
- Multi-agent negotiation and semantic pricing feed into PLB patterns.
- Strong governance on cross-agent impacts.
Phase 4 — Constitutional Partner (diagnostic)
- PLB becomes a standard tool:
  - to scan for constitutional blind spots,
  - to simulate impact of proposed amendments,
  - to highlight structural risks.
It still never writes the constitution itself.

Closing

“Self-modifying AI” sounds like science fiction.

In practice, it’s mostly:

pattern mining over incident logs,
patch proposal workflows,
and an ever-growing pile of YAML.

This charter’s claim is simple:

If you are going to let a system generate and apply patches to itself, treat that ability as a governed resource. Give it scopes, budgets, metrics, and red lines — especially around ethics and constitutions.

Only then does “self-improvement” become a feature, and not a polite word for “uncontrolled drift”.

CityOS Under SI-Core: A Worked Example Across All Invariants

February 8, 2026

Memory as Civic Infrastructure: Retention, Forgetting, and Reconstruction

February 4, 2026

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote