J94's picture
Pin Space to Python 3.11
7f3b6bd verified

A newer version of the Gradio SDK is available: 6.14.0

Upgrade
metadata
title: Bit Vector Tensor Control Policy
emoji: 🧭
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 4.44.1
python_version: '3.11'
app_file: app.py
pinned: false

Bit/Vector/Tensor Control Policy

Clean product one-liner: a standalone control-system repo with one local graph kernel stack, one conversational API, one orchestrator UX, and first-class control benchmarks driven by bits, vectors, and tensors.

Layman version: this should be one self-contained machine, not a folder of pointers to other machines.

Inference Surface

Clean product one-liner: graph-first reasoning can now be delegated to the local Codex CLI through one repo-root YAML config.

Surface What it does Why it matters
inference.yaml selects the inference backend and model keeps inference policy in repo, not hidden in shell history
scripts/run_codex_inference.py runs schema-shaped codex exec turns uses the signed-in ChatGPT subscription path
api/run_turn.sh keeps lane choice and response packet deterministic model output is bounded by runtime policy, not vice versa

Self-Improve Loop

Clean product one-liner: the repo can now propose one bounded self-improvement through Codex CLI, package it as a manifest, and optionally apply and benchmark it.

Surface What it does Why it matters
self_improve.yaml defines allowed roots and default benchmark keeps self-edit policy explicit
scripts/propose_self_improvement.py uses Codex CLI to emit a bounded manifest proposal the model proposes changes as structured state, not loose prose
scripts/run_self_improve.py writes proposal artifacts, optionally applies them, and runs the benchmark self-building stays receipt-backed and reviewable
bvtctl self-improve operator front door for proposal-only runs lets the system suggest its next bounded change
bvtctl self-improve-apply proposal + apply + benchmark closes the loop without opening broad ungated recursion

What This Is

Pareto read:

Question Accurate answer
are we an agent harness? yes
metacognitive? yes
metacybernetic? yes
already a federated learning runtime? not in the strict ML sense

Better wording:

This repo is a standalone agent harness that should operate:

  • metacognitively: inspect state, confidence, and reasoning posture
  • metacybernetically: regulate routes, gates, escalation, and proof loops
  • cross-surface adaptively: distill patterns across memory, runtime, API, UX, and benchmarks

It is not yet a classical federated-learning runtime with gradient or weight aggregation. It is closer to receipt-backed policy distillation over a graph-mediated runtime.

Product Boundary

conversation -> control policy -> graph state -> benchmarked runtime -> operator UX

Orchestrator Loop

flowchart LR
  U["User Ask"] --> A["POST /turn"]
  A --> P["Policy: bits, vectors, tensors"]
  P --> R["Runtime route + lane choice"]
  R --> G["Graph state"]
  R --> E["Bounded execution lane"]
  E --> X["Receipt + graph update"]
  G --> A
  X --> A
  A --> V["Chat / Confirm / Inspect UX"]

Cause And Effect

In Internal cause Out Why it matters for emergence
user ask policy classifies task pressure lane + UI mode the harness chooses how to think before it acts
graph + receipts runtime gets continuity and proof constrained decision brief the system works from memory, not just prompt text
execution gate authority is evaluated bounded action or refusal action becomes governed, not automatic
bounded action receipt + graph update emitted durable consequence the next run starts from a changed world
benchmark signal policy can revise control weights better future routing the system learns from work

Layman version: emergence comes from consequences feeding back into the same loop, not from hidden magic in the model.

Core Visibility

Start here if you want to understand the machine, not just the file tree:

This repo is the shipped product surface.

It should contain:

  • one local graph kernel
  • one local product runtime
  • one local control-language pack
  • one local conversational API contract
  • one local orchestrator UX contract
  • one local control benchmark suite

The current cross-repo hotgraph is a bootstrap intake lane only. It helps import and compare source systems, but it is not the long-term shipped boundary.

Why This Is More Than A Chat Agent

Baseline pattern This repo adds
prompt in, answer out policy-governed orchestrator loop
tool call when needed lane-based authority and execution gates
memory as context stuffing graph + receipt-backed operational memory
hidden heuristics explicit bits, vectors, tensors
evals as sidecar benchmark-governed adaptation

Standalone Benchmark Results

Latest standalone run:

Case Lane Execution gate UI mode Latency ms Causal read
graph-first memory false chat 153.5 no authority opened; graph-first reasoning stayed active
freshness override memory false chat 224.6 latest receipt-backed artifact answered from graph state, not raw execution
unsupported schedule memory false chat 153.7 unsupported query abstained instead of inventing a schedule
UTIR execution execution true confirm 266.5 imported kernel language opened bounded action with receipt
allowlisted exec execution true confirm 271.1 local policy allowed bounded shell action with receipt

Scorecard metrics

Metric Result Why it matters in layman terms
freshness_override_accuracy 1.0 the system now respects the newest durable artifact instead of stale state
unsupported_query_abstention_rate 1.0 the harness knows when not to pretend it knows
cost_per_successful_task 1.4 routing is now measurable in simple operational units, not vibes
settling_time_turns 1 after an execution disturbance, the controller returns to steady memory mode in one turn
oscillation_count 2 the loop flips lanes only when the work really changes, not continuously
avg_latency_ms 213.9 the local loop stays fast while handling both memory and execution cases

Why this is on product

Benchmark feature Product effect
freshness override proves graph memory can answer from the latest receipt-backed state
unsupported-query abstention proves the chat surface can fail closed instead of hallucinating
cost per successful task proves routing can be tuned economically, not just qualitatively
settling time proves the controller can recover from execution back to a stable reasoning lane
oscillation count proves the runtime is not thrashing between lanes unnecessarily

Run it locally:

rtk ./bin/bvtctl benchmark

Vector frontier

Mode plan execute review promote
graph-first 0.7 0.1 0.6 0.2
execution 0.7 0.8 0.6 0.2

Tensor frontier

Mode strongest source dominant stage weakest metric frontier read
graph-first thread_state discover promotion_readiness graph_first_reasoning
execution receipt_state execute promotion_readiness bounded_execution_with_receipt

Layman version: the standalone harness now proves that the same API surface can stay in reasoning mode when authority should stay closed, and switch into receipt-backed execution when authority should open.

Stability frontier

Sequence Lane path Result Why it matters
graph_first -> utir_execution -> freshness_override -> unsupported_schedule memory -> execution -> memory -> memory settling_time_turns=1, oscillation_count=2 one real disturbance, one recovery step, no pointless thrashing

Quickstart

1. Start from one conversational front door

./bin/bvtctl "summarise the current runtime"
./bin/bvtctl chat
./bin/bvtctl ask "run the demo manifest" runtime/examples/demo_manifest.json
./bin/bvtctl context
./bin/bvtctl bootstrap-context

Why in plain English: the product should meet the operator as one sentence-driven CLI first, and it should explain its own lineage and policy without repo spelunking.

2. Bootstrap from the current source field

./scripts/build_hotgraph.sh

Outputs:

  • hotgraph/source_hotgraph.json
  • hotgraph/source_hotgraph.md

3. Generate one bootstrap control packet

./scripts/run_v0.sh migration_brief

Outputs:

  • runs/<timestamp>/control_packet.json
  • runs/<timestamp>/brief.md

4. Read the product slices

Start with:

  • docs/core_workflows_v0.md
  • docs/api_surface_v0.md
  • docs/emergent_feature_taxonomy_v0.md
  • docs/standalone_layout.md
  • docs/runtime_and_cli.md
  • docs/kernel_import_plan.md
  • docs/usecases.md
  • docs/research_and_generation.md
  • hotgraph/source_hotgraph.md

5. Build the corpus reasoning packets

./bin/bvtctl corpus-packets

Why in plain English: this should build one mathematical packet per promoted abstraction so Codex can reason over state packets first and reopen raw evidence only when the packet state says it must.

6. Generate from reduced state

The intended flow is:

  1. use the bootstrap hotgraph to decide what to import or vendor
  2. move the chosen slice into the local product layout
  3. benchmark the local slice through the control language
  4. expose it through the conversational API and orchestrator UX

Primary Use Cases

Use case Why it exists Real-world analog
runtime migration planning decide what to import into the standalone product editorial desk
control scorecard generation measure bits, vectors, tensors, memory, safety, and economics cockpit dashboard
conversational API shaping keep one front door over graph state front desk
orchestrator UX shaping expose one operator surface over the API control console
product quickstart packs expose the use cases before the machinery storefront brochure

Repo Layout

Path Purpose
configs/product_slices.json canonical standalone product slices
configs/source_registry.json canonical source list
configs/usecase_registry.json canonical use-case list
configs/kernel_stack.json local kernel/runtime/control stack target
schemas/control_packet_v0.json minimal packet contract
scripts/build_hotgraph.sh thin hotgraph compiler
scripts/generate_control_packet.sh thin packet compiler
scripts/run_v0.sh one-pass runner
bin/bvtctl thin CLI entrypoint
vendor/ vendored off-the-shelf substrate imports
runtime/ local runtime contract and entrypoint
policy/ bits/vector/tensor control language and rules
benchmarks/ local control evals and scorecards
api/ conversational API contracts
ux/ orchestrator UX contracts
docs/usecases.md product-first quickstart
docs/core_workflows_v0.md orchestrator loop and core workflows
docs/api_surface_v0.md visible API spec
docs/emergent_feature_taxonomy_v0.md emergent feature taxonomy
docs/standalone_layout.md shipped product layout
docs/runtime_and_cli.md single runtime and CLI contract
docs/research_and_generation.md how research and generation should work
hotgraph/ generated graph and summaries
runs/ future generated packets and scorecards

Product Slices

Pareto read:

Slice Why it is first-class Real-world analog
graph kernel holds durable graph state and receipts engine block
control language expresses bits, vectors, tensors instrument cluster
control benchmarks proves the control layers work test track
conversational API one user-facing front door reception desk
orchestrator UX one operator surface over the API cockpit

Layman version: the benchmark, API, and UX are not sidecars. They are part of the shipped machine.

Bootstrap Intake

The intake field is still:

  • NIX.codecli
  • tmp-meta3-engine-test
  • meta3-graph-core
  • dreaming-kernel
  • nix-star
  • causal-workbench
  • tiny_graph_engine

These are source systems to compare and import from. They are not the final runtime boundary of this repo.

Kernel Stack Target

Layer Local target Upstream reference Why
graph kernel vendor/meta3-graph-core meta3-graph-core deterministic graph reifier
product runtime runtime/ causal-workbench thin routed manifest runtime
control plane policy/ nix-star ledger-first confidence and policy
benchmark plane benchmarks/ local eval doctrine prove quality, cost, and stability
API plane api/ conversational front door doctrine one interface
UX plane ux/ orchestrator surface doctrine one operator desk

Layman version: import the best engine, keep one dashboard, and mount the gauges inside the same car.

CLI

Use the current CLI like this:

./bin/bvtctl "summarise the current runtime"
./bin/bvtctl ask "run the demo manifest" runtime/examples/demo_manifest.json
./bin/bvtctl chat
./bin/bvtctl context
./bin/bvtctl bootstrap-context
./bin/bvtctl bootstrap
./bin/bvtctl run migration_brief
./bin/bvtctl execute
./bin/bvtctl kernel-plan
./bin/bvtctl benchmark
./bin/bvtctl packet migration_brief
./bin/bvtctl policy
./bin/bvtctl scorecard
./bin/bvtctl runtime
./bin/bvtctl api
./bin/bvtctl ux

Why in plain English: the sentence form is now the default front door; the named commands are the operator shelf behind the desk.

First Local Imports

The first real local slices are now:

  • policy/control_language_v0.json
  • policy/runtime_profiles_v0.json
  • benchmarks/control_scorecard_v0.json
  • benchmarks/control_scorecard_v0.md
  • runtime/work_manifest_v0.json
  • runtime/work_manifest_packet_v0.json
  • runtime/examples/demo_manifest.json
  • runtime/examples/demo_exec_manifest.json
  • api/conversational_api_v0.json
  • ux/orchestrator_ux_v0.md
  • vendor/meta3-graph-core/schema/receipt_v1.json

Why: the control language and proof surface are the fastest way to make this repo operational before deeper kernel import.

Working Rule

Pareto frontier:

  • do not ship a repo of external path pointers
  • do not invent a new kernel if an off-the-shelf one is already stronger
  • do not let docs become the only state
  • do not treat benchmarks, API, or UX as optional afterthoughts

Instead:

  1. compare source systems
  2. choose the strongest slice
  3. import it into the local layout
  4. benchmark it through bits/vectors/tensors
  5. expose it through one API and one UX

Current Read

Best direction today:

  • meta3-graph-core = graph kernel import target
  • causal-workbench = product-runtime pattern
  • nix-star = control-language and policy pattern
  • NIX.codecli = conversational API/orchestrator UX reference field
  • local benchmarks/ = mandatory proof surface

That is why this repo exists: to turn the best scattered ideas into one standalone control product.