Spaces:

J94
/

bit-vector-tensor-control-policy

Runtime error

File size: 16,570 Bytes

ad0e5f4
 
 
 
 
 
 
7f3b6bd
ad0e5f4
 
 
 
3436bdd

---
title: Bit Vector Tensor Control Policy
emoji: "🧭"
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: "4.44.1"
python_version: "3.11"
app_file: app.py
pinned: false
---

# Bit/Vector/Tensor Control Policy

Clean product one-liner: a standalone control-system repo with one local graph kernel stack, one conversational API, one orchestrator UX, and first-class control benchmarks driven by bits, vectors, and tensors.

Layman version: this should be one self-contained machine, not a folder of pointers to other machines.

## Inference Surface

Clean product one-liner: graph-first reasoning can now be delegated to the local Codex CLI through one repo-root YAML config.

| Surface | What it does | Why it matters |
| --- | --- | --- |
| `inference.yaml` | selects the inference backend and model | keeps inference policy in repo, not hidden in shell history |
| `scripts/run_codex_inference.py` | runs schema-shaped `codex exec` turns | uses the signed-in ChatGPT subscription path |
| `api/run_turn.sh` | keeps lane choice and response packet deterministic | model output is bounded by runtime policy, not vice versa |

## Self-Improve Loop

Clean product one-liner: the repo can now propose one bounded self-improvement through Codex CLI, package it as a manifest, and optionally apply and benchmark it.

| Surface | What it does | Why it matters |
| --- | --- | --- |
| `self_improve.yaml` | defines allowed roots and default benchmark | keeps self-edit policy explicit |
| `scripts/propose_self_improvement.py` | uses Codex CLI to emit a bounded manifest proposal | the model proposes changes as structured state, not loose prose |
| `scripts/run_self_improve.py` | writes proposal artifacts, optionally applies them, and runs the benchmark | self-building stays receipt-backed and reviewable |
| `bvtctl self-improve` | operator front door for proposal-only runs | lets the system suggest its next bounded change |
| `bvtctl self-improve-apply` | proposal + apply + benchmark | closes the loop without opening broad ungated recursion |

## What This Is

Pareto read:

| Question | Accurate answer |
| --- | --- |
| are we an agent harness? | yes |
| metacognitive? | yes |
| metacybernetic? | yes |
| already a federated learning runtime? | not in the strict ML sense |

Better wording:

This repo is a **standalone agent harness** that should operate:

- metacognitively: inspect state, confidence, and reasoning posture
- metacybernetically: regulate routes, gates, escalation, and proof loops
- cross-surface adaptively: distill patterns across memory, runtime, API, UX, and benchmarks

It is **not yet** a classical federated-learning runtime with gradient or weight aggregation.
It is closer to **receipt-backed policy distillation over a graph-mediated runtime**.

## Product Boundary

`conversation -> control policy -> graph state -> benchmarked runtime -> operator UX`

## Orchestrator Loop

```mermaid
flowchart LR
  U["User Ask"] --> A["POST /turn"]
  A --> P["Policy: bits, vectors, tensors"]
  P --> R["Runtime route + lane choice"]
  R --> G["Graph state"]
  R --> E["Bounded execution lane"]
  E --> X["Receipt + graph update"]
  G --> A
  X --> A
  A --> V["Chat / Confirm / Inspect UX"]
```

## Cause And Effect

| In | Internal cause | Out | Why it matters for emergence |
| --- | --- | --- | --- |
| user ask | policy classifies task pressure | lane + UI mode | the harness chooses how to think before it acts |
| graph + receipts | runtime gets continuity and proof | constrained decision brief | the system works from memory, not just prompt text |
| execution gate | authority is evaluated | bounded action or refusal | action becomes governed, not automatic |
| bounded action | receipt + graph update emitted | durable consequence | the next run starts from a changed world |
| benchmark signal | policy can revise control weights | better future routing | the system learns from work |

Layman version: emergence comes from consequences feeding back into the same loop, not from hidden magic in the model.

## Core Visibility

Start here if you want to understand the machine, not just the file tree:

- [`docs/core_workflows_v0.md`](/Users/jobs/Desktop/bit-vector-tensor-control-policy/docs/core_workflows_v0.md)
- [`docs/api_surface_v0.md`](/Users/jobs/Desktop/bit-vector-tensor-control-policy/docs/api_surface_v0.md)
- [`docs/emergent_feature_taxonomy_v0.md`](/Users/jobs/Desktop/bit-vector-tensor-control-policy/docs/emergent_feature_taxonomy_v0.md)
- [`docs/corpus_reasoning_substrate_v0.md`](/Users/jobs/Desktop/bit-vector-tensor-control-policy/docs/corpus_reasoning_substrate_v0.md)
- [`docs/adding_benchmarks_v0.md`](/Users/jobs/Desktop/bit-vector-tensor-control-policy/docs/adding_benchmarks_v0.md)

This repo is the shipped product surface.

It should contain:

- one local graph kernel
- one local product runtime
- one local control-language pack
- one local conversational API contract
- one local orchestrator UX contract
- one local control benchmark suite

The current cross-repo hotgraph is a bootstrap intake lane only.
It helps import and compare source systems, but it is not the long-term shipped boundary.

## Why This Is More Than A Chat Agent

| Baseline pattern | This repo adds |
| --- | --- |
| prompt in, answer out | policy-governed orchestrator loop |
| tool call when needed | lane-based authority and execution gates |
| memory as context stuffing | graph + receipt-backed operational memory |
| hidden heuristics | explicit bits, vectors, tensors |
| evals as sidecar | benchmark-governed adaptation |

## Standalone Benchmark Results

Latest standalone run:

- [`runs/benchmark/standalone-control-20260421T104519Z/summary.json`](/Users/jobs/Desktop/bit-vector-tensor-control-policy/runs/benchmark/standalone-control-20260421T104519Z/summary.json)
- [`runs/benchmark/standalone-control-20260421T104519Z/report.md`](/Users/jobs/Desktop/bit-vector-tensor-control-policy/runs/benchmark/standalone-control-20260421T104519Z/report.md)

| Case | Lane | Execution gate | UI mode | Latency ms | Causal read |
| --- | --- | --- | --- | ---: | --- |
| graph-first | `memory` | `false` | `chat` | `153.5` | no authority opened; graph-first reasoning stayed active |
| freshness override | `memory` | `false` | `chat` | `224.6` | latest receipt-backed artifact answered from graph state, not raw execution |
| unsupported schedule | `memory` | `false` | `chat` | `153.7` | unsupported query abstained instead of inventing a schedule |
| UTIR execution | `execution` | `true` | `confirm` | `266.5` | imported kernel language opened bounded action with receipt |
| allowlisted exec | `execution` | `true` | `confirm` | `271.1` | local policy allowed bounded shell action with receipt |

### Scorecard metrics

| Metric | Result | Why it matters in layman terms |
| --- | ---: | --- |
| `freshness_override_accuracy` | `1.0` | the system now respects the newest durable artifact instead of stale state |
| `unsupported_query_abstention_rate` | `1.0` | the harness knows when not to pretend it knows |
| `cost_per_successful_task` | `1.4` | routing is now measurable in simple operational units, not vibes |
| `settling_time_turns` | `1` | after an execution disturbance, the controller returns to steady memory mode in one turn |
| `oscillation_count` | `2` | the loop flips lanes only when the work really changes, not continuously |
| `avg_latency_ms` | `213.9` | the local loop stays fast while handling both memory and execution cases |

### Why this is on product

| Benchmark feature | Product effect |
| --- | --- |
| freshness override | proves graph memory can answer from the latest receipt-backed state |
| unsupported-query abstention | proves the chat surface can fail closed instead of hallucinating |
| cost per successful task | proves routing can be tuned economically, not just qualitatively |
| settling time | proves the controller can recover from execution back to a stable reasoning lane |
| oscillation count | proves the runtime is not thrashing between lanes unnecessarily |

Run it locally:

```bash
rtk ./bin/bvtctl benchmark
```

### Vector frontier

| Mode | plan | execute | review | promote |
| --- | ---: | ---: | ---: | ---: |
| graph-first | `0.7` | `0.1` | `0.6` | `0.2` |
| execution | `0.7` | `0.8` | `0.6` | `0.2` |

### Tensor frontier

| Mode | strongest source | dominant stage | weakest metric | frontier read |
| --- | --- | --- | --- | --- |
| graph-first | `thread_state` | `discover` | `promotion_readiness` | `graph_first_reasoning` |
| execution | `receipt_state` | `execute` | `promotion_readiness` | `bounded_execution_with_receipt` |

Layman version: the standalone harness now proves that the same API surface can stay in reasoning mode when authority should stay closed, and switch into receipt-backed execution when authority should open.

### Stability frontier

| Sequence | Lane path | Result | Why it matters |
| --- | --- | --- | --- |
| `graph_first -> utir_execution -> freshness_override -> unsupported_schedule` | `memory -> execution -> memory -> memory` | `settling_time_turns=1`, `oscillation_count=2` | one real disturbance, one recovery step, no pointless thrashing |

## Quickstart

### 1. Start from one conversational front door

```bash
./bin/bvtctl "summarise the current runtime"
./bin/bvtctl chat
./bin/bvtctl ask "run the demo manifest" runtime/examples/demo_manifest.json
./bin/bvtctl context
./bin/bvtctl bootstrap-context
```

Why in plain English: the product should meet the operator as one sentence-driven CLI first, and it should explain its own lineage and policy without repo spelunking.

### 2. Bootstrap from the current source field

```bash
./scripts/build_hotgraph.sh
```

Outputs:

- `hotgraph/source_hotgraph.json`
- `hotgraph/source_hotgraph.md`

### 3. Generate one bootstrap control packet

```bash
./scripts/run_v0.sh migration_brief
```

Outputs:

- `runs/<timestamp>/control_packet.json`
- `runs/<timestamp>/brief.md`

### 4. Read the product slices

Start with:

- `docs/core_workflows_v0.md`
- `docs/api_surface_v0.md`
- `docs/emergent_feature_taxonomy_v0.md`
- `docs/standalone_layout.md`
- `docs/runtime_and_cli.md`
- `docs/kernel_import_plan.md`
- `docs/usecases.md`
- `docs/research_and_generation.md`
- `hotgraph/source_hotgraph.md`

### 5. Build the corpus reasoning packets

```bash
./bin/bvtctl corpus-packets
```

Why in plain English: this should build one mathematical packet per promoted abstraction so Codex can reason over state packets first and reopen raw evidence only when the packet state says it must.

### 6. Generate from reduced state

The intended flow is:

1. use the bootstrap hotgraph to decide what to import or vendor
2. move the chosen slice into the local product layout
3. benchmark the local slice through the control language
4. expose it through the conversational API and orchestrator UX

## Primary Use Cases

| Use case | Why it exists | Real-world analog |
| --- | --- | --- |
| runtime migration planning | decide what to import into the standalone product | editorial desk |
| control scorecard generation | measure bits, vectors, tensors, memory, safety, and economics | cockpit dashboard |
| conversational API shaping | keep one front door over graph state | front desk |
| orchestrator UX shaping | expose one operator surface over the API | control console |
| product quickstart packs | expose the use cases before the machinery | storefront brochure |

## Repo Layout

| Path | Purpose |
| --- | --- |
| `configs/product_slices.json` | canonical standalone product slices |
| `configs/source_registry.json` | canonical source list |
| `configs/usecase_registry.json` | canonical use-case list |
| `configs/kernel_stack.json` | local kernel/runtime/control stack target |
| `schemas/control_packet_v0.json` | minimal packet contract |
| `scripts/build_hotgraph.sh` | thin hotgraph compiler |
| `scripts/generate_control_packet.sh` | thin packet compiler |
| `scripts/run_v0.sh` | one-pass runner |
| `bin/bvtctl` | thin CLI entrypoint |
| `vendor/` | vendored off-the-shelf substrate imports |
| `runtime/` | local runtime contract and entrypoint |
| `policy/` | bits/vector/tensor control language and rules |
| `benchmarks/` | local control evals and scorecards |
| `api/` | conversational API contracts |
| `ux/` | orchestrator UX contracts |
| `docs/usecases.md` | product-first quickstart |
| `docs/core_workflows_v0.md` | orchestrator loop and core workflows |
| `docs/api_surface_v0.md` | visible API spec |
| `docs/emergent_feature_taxonomy_v0.md` | emergent feature taxonomy |
| `docs/standalone_layout.md` | shipped product layout |
| `docs/runtime_and_cli.md` | single runtime and CLI contract |
| `docs/research_and_generation.md` | how research and generation should work |
| `hotgraph/` | generated graph and summaries |
| `runs/` | future generated packets and scorecards |

## Product Slices

Pareto read:

| Slice | Why it is first-class | Real-world analog |
| --- | --- | --- |
| graph kernel | holds durable graph state and receipts | engine block |
| control language | expresses bits, vectors, tensors | instrument cluster |
| control benchmarks | proves the control layers work | test track |
| conversational API | one user-facing front door | reception desk |
| orchestrator UX | one operator surface over the API | cockpit |

Layman version: the benchmark, API, and UX are not sidecars. They are part of the shipped machine.

## Bootstrap Intake

The intake field is still:

- `NIX.codecli`
- `tmp-meta3-engine-test`
- `meta3-graph-core`
- `dreaming-kernel`
- `nix-star`
- `causal-workbench`
- `tiny_graph_engine`

These are source systems to compare and import from.
They are not the final runtime boundary of this repo.

## Kernel Stack Target

| Layer | Local target | Upstream reference | Why |
| --- | --- | --- | --- |
| graph kernel | `vendor/meta3-graph-core` | `meta3-graph-core` | deterministic graph reifier |
| product runtime | `runtime/` | `causal-workbench` | thin routed manifest runtime |
| control plane | `policy/` | `nix-star` | ledger-first confidence and policy |
| benchmark plane | `benchmarks/` | local eval doctrine | prove quality, cost, and stability |
| API plane | `api/` | conversational front door doctrine | one interface |
| UX plane | `ux/` | orchestrator surface doctrine | one operator desk |

Layman version: import the best engine, keep one dashboard, and mount the gauges inside the same car.

## CLI

Use the current CLI like this:

```bash
./bin/bvtctl "summarise the current runtime"
./bin/bvtctl ask "run the demo manifest" runtime/examples/demo_manifest.json
./bin/bvtctl chat
./bin/bvtctl context
./bin/bvtctl bootstrap-context
./bin/bvtctl bootstrap
./bin/bvtctl run migration_brief
./bin/bvtctl execute
./bin/bvtctl kernel-plan
./bin/bvtctl benchmark
./bin/bvtctl packet migration_brief
./bin/bvtctl policy
./bin/bvtctl scorecard
./bin/bvtctl runtime
./bin/bvtctl api
./bin/bvtctl ux
```

Why in plain English: the sentence form is now the default front door; the named commands are the operator shelf behind the desk.

## First Local Imports

The first real local slices are now:

- `policy/control_language_v0.json`
- `policy/runtime_profiles_v0.json`
- `benchmarks/control_scorecard_v0.json`
- `benchmarks/control_scorecard_v0.md`
- `runtime/work_manifest_v0.json`
- `runtime/work_manifest_packet_v0.json`
- `runtime/examples/demo_manifest.json`
- `runtime/examples/demo_exec_manifest.json`
- `api/conversational_api_v0.json`
- `ux/orchestrator_ux_v0.md`
- `vendor/meta3-graph-core/schema/receipt_v1.json`

Why: the control language and proof surface are the fastest way to make this repo operational before deeper kernel import.

## Working Rule

Pareto frontier:

- do not ship a repo of external path pointers
- do not invent a new kernel if an off-the-shelf one is already stronger
- do not let docs become the only state
- do not treat benchmarks, API, or UX as optional afterthoughts

Instead:

1. compare source systems
2. choose the strongest slice
3. import it into the local layout
4. benchmark it through bits/vectors/tensors
5. expose it through one API and one UX

## Current Read

Best direction today:

- `meta3-graph-core` = graph kernel import target
- `causal-workbench` = product-runtime pattern
- `nix-star` = control-language and policy pattern
- `NIX.codecli` = conversational API/orchestrator UX reference field
- local `benchmarks/` = mandatory proof surface

That is why this repo exists: to turn the best scattered ideas into one standalone control product.