Spaces:

J94
/

bit-vector-tensor-control-policy

Runtime error

App Files Files Community

bit-vector-tensor-control-policy / README.md

J94

Pin Space to Python 3.11

7f3b6bd verified 30 days ago

preview code

raw

history blame contribute delete

16.6 kB

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

metadata

title: Bit Vector Tensor Control Policy
emoji: 🧭
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 4.44.1
python_version: '3.11'
app_file: app.py
pinned: false

Bit/Vector/Tensor Control Policy

Clean product one-liner: a standalone control-system repo with one local graph kernel stack, one conversational API, one orchestrator UX, and first-class control benchmarks driven by bits, vectors, and tensors.

Layman version: this should be one self-contained machine, not a folder of pointers to other machines.

Inference Surface

Clean product one-liner: graph-first reasoning can now be delegated to the local Codex CLI through one repo-root YAML config.

Surface	What it does	Why it matters
`inference.yaml`	selects the inference backend and model	keeps inference policy in repo, not hidden in shell history
`scripts/run_codex_inference.py`	runs schema-shaped `codex exec` turns	uses the signed-in ChatGPT subscription path
`api/run_turn.sh`	keeps lane choice and response packet deterministic	model output is bounded by runtime policy, not vice versa

Self-Improve Loop

Clean product one-liner: the repo can now propose one bounded self-improvement through Codex CLI, package it as a manifest, and optionally apply and benchmark it.

Surface	What it does	Why it matters
`self_improve.yaml`	defines allowed roots and default benchmark	keeps self-edit policy explicit
`scripts/propose_self_improvement.py`	uses Codex CLI to emit a bounded manifest proposal	the model proposes changes as structured state, not loose prose
`scripts/run_self_improve.py`	writes proposal artifacts, optionally applies them, and runs the benchmark	self-building stays receipt-backed and reviewable
`bvtctl self-improve`	operator front door for proposal-only runs	lets the system suggest its next bounded change
`bvtctl self-improve-apply`	proposal + apply + benchmark	closes the loop without opening broad ungated recursion

What This Is

Pareto read:

Question	Accurate answer
are we an agent harness?	yes
metacognitive?	yes
metacybernetic?	yes
already a federated learning runtime?	not in the strict ML sense

Better wording:

This repo is a standalone agent harness that should operate:

metacognitively: inspect state, confidence, and reasoning posture
metacybernetically: regulate routes, gates, escalation, and proof loops
cross-surface adaptively: distill patterns across memory, runtime, API, UX, and benchmarks

It is not yet a classical federated-learning runtime with gradient or weight aggregation. It is closer to receipt-backed policy distillation over a graph-mediated runtime.

Product Boundary

conversation -> control policy -> graph state -> benchmarked runtime -> operator UX

Orchestrator Loop

flowchart LR
  U["User Ask"] --> A["POST /turn"]
  A --> P["Policy: bits, vectors, tensors"]
  P --> R["Runtime route + lane choice"]
  R --> G["Graph state"]
  R --> E["Bounded execution lane"]
  E --> X["Receipt + graph update"]
  G --> A
  X --> A
  A --> V["Chat / Confirm / Inspect UX"]

Cause And Effect

In	Internal cause	Out	Why it matters for emergence
user ask	policy classifies task pressure	lane + UI mode	the harness chooses how to think before it acts
graph + receipts	runtime gets continuity and proof	constrained decision brief	the system works from memory, not just prompt text
execution gate	authority is evaluated	bounded action or refusal	action becomes governed, not automatic
bounded action	receipt + graph update emitted	durable consequence	the next run starts from a changed world
benchmark signal	policy can revise control weights	better future routing	the system learns from work

Layman version: emergence comes from consequences feeding back into the same loop, not from hidden magic in the model.

Core Visibility

Start here if you want to understand the machine, not just the file tree:

This repo is the shipped product surface.

It should contain:

one local graph kernel
one local product runtime
one local control-language pack
one local conversational API contract
one local orchestrator UX contract
one local control benchmark suite

The current cross-repo hotgraph is a bootstrap intake lane only. It helps import and compare source systems, but it is not the long-term shipped boundary.

Why This Is More Than A Chat Agent

Baseline pattern	This repo adds
prompt in, answer out	policy-governed orchestrator loop
tool call when needed	lane-based authority and execution gates
memory as context stuffing	graph + receipt-backed operational memory
hidden heuristics	explicit bits, vectors, tensors
evals as sidecar	benchmark-governed adaptation

Standalone Benchmark Results

Latest standalone run:

Case	Lane	Execution gate	UI mode	Latency ms	Causal read
graph-first	`memory`	`false`	`chat`	`153.5`	no authority opened; graph-first reasoning stayed active
freshness override	`memory`	`false`	`chat`	`224.6`	latest receipt-backed artifact answered from graph state, not raw execution
unsupported schedule	`memory`	`false`	`chat`	`153.7`	unsupported query abstained instead of inventing a schedule
UTIR execution	`execution`	`true`	`confirm`	`266.5`	imported kernel language opened bounded action with receipt
allowlisted exec	`execution`	`true`	`confirm`	`271.1`	local policy allowed bounded shell action with receipt

Scorecard metrics

Metric	Result	Why it matters in layman terms
`freshness_override_accuracy`	`1.0`	the system now respects the newest durable artifact instead of stale state
`unsupported_query_abstention_rate`	`1.0`	the harness knows when not to pretend it knows
`cost_per_successful_task`	`1.4`	routing is now measurable in simple operational units, not vibes
`settling_time_turns`	`1`	after an execution disturbance, the controller returns to steady memory mode in one turn
`oscillation_count`	`2`	the loop flips lanes only when the work really changes, not continuously
`avg_latency_ms`	`213.9`	the local loop stays fast while handling both memory and execution cases

Why this is on product

Benchmark feature	Product effect
freshness override	proves graph memory can answer from the latest receipt-backed state
unsupported-query abstention	proves the chat surface can fail closed instead of hallucinating
cost per successful task	proves routing can be tuned economically, not just qualitatively
settling time	proves the controller can recover from execution back to a stable reasoning lane
oscillation count	proves the runtime is not thrashing between lanes unnecessarily

Run it locally:

rtk ./bin/bvtctl benchmark

Vector frontier

Mode	plan	execute	review	promote
graph-first	`0.7`	`0.1`	`0.6`	`0.2`
execution	`0.7`	`0.8`	`0.6`	`0.2`

Tensor frontier

Mode	strongest source	dominant stage	weakest metric	frontier read
graph-first	`thread_state`	`discover`	`promotion_readiness`	`graph_first_reasoning`
execution	`receipt_state`	`execute`	`promotion_readiness`	`bounded_execution_with_receipt`

Layman version: the standalone harness now proves that the same API surface can stay in reasoning mode when authority should stay closed, and switch into receipt-backed execution when authority should open.

Stability frontier

Sequence	Lane path	Result	Why it matters
`graph_first -> utir_execution -> freshness_override -> unsupported_schedule`	`memory -> execution -> memory -> memory`	`settling_time_turns=1`, `oscillation_count=2`	one real disturbance, one recovery step, no pointless thrashing

Quickstart

1. Start from one conversational front door

./bin/bvtctl "summarise the current runtime"
./bin/bvtctl chat
./bin/bvtctl ask "run the demo manifest" runtime/examples/demo_manifest.json
./bin/bvtctl context
./bin/bvtctl bootstrap-context

Why in plain English: the product should meet the operator as one sentence-driven CLI first, and it should explain its own lineage and policy without repo spelunking.

2. Bootstrap from the current source field

./scripts/build_hotgraph.sh

Outputs:

hotgraph/source_hotgraph.json
hotgraph/source_hotgraph.md

3. Generate one bootstrap control packet

./scripts/run_v0.sh migration_brief

Outputs:

runs/<timestamp>/control_packet.json
runs/<timestamp>/brief.md

4. Read the product slices

Start with:

docs/core_workflows_v0.md
docs/api_surface_v0.md
docs/emergent_feature_taxonomy_v0.md
docs/standalone_layout.md
docs/runtime_and_cli.md
docs/kernel_import_plan.md
docs/usecases.md
docs/research_and_generation.md
hotgraph/source_hotgraph.md

5. Build the corpus reasoning packets

./bin/bvtctl corpus-packets

Why in plain English: this should build one mathematical packet per promoted abstraction so Codex can reason over state packets first and reopen raw evidence only when the packet state says it must.

6. Generate from reduced state

The intended flow is:

use the bootstrap hotgraph to decide what to import or vendor
move the chosen slice into the local product layout
benchmark the local slice through the control language
expose it through the conversational API and orchestrator UX

Primary Use Cases

Use case	Why it exists	Real-world analog
runtime migration planning	decide what to import into the standalone product	editorial desk
control scorecard generation	measure bits, vectors, tensors, memory, safety, and economics	cockpit dashboard
conversational API shaping	keep one front door over graph state	front desk
orchestrator UX shaping	expose one operator surface over the API	control console
product quickstart packs	expose the use cases before the machinery	storefront brochure

Repo Layout

Path	Purpose
`configs/product_slices.json`	canonical standalone product slices
`configs/source_registry.json`	canonical source list
`configs/usecase_registry.json`	canonical use-case list
`configs/kernel_stack.json`	local kernel/runtime/control stack target
`schemas/control_packet_v0.json`	minimal packet contract
`scripts/build_hotgraph.sh`	thin hotgraph compiler
`scripts/generate_control_packet.sh`	thin packet compiler
`scripts/run_v0.sh`	one-pass runner
`bin/bvtctl`	thin CLI entrypoint
`vendor/`	vendored off-the-shelf substrate imports
`runtime/`	local runtime contract and entrypoint
`policy/`	bits/vector/tensor control language and rules
`benchmarks/`	local control evals and scorecards
`api/`	conversational API contracts
`ux/`	orchestrator UX contracts
`docs/usecases.md`	product-first quickstart
`docs/core_workflows_v0.md`	orchestrator loop and core workflows
`docs/api_surface_v0.md`	visible API spec
`docs/emergent_feature_taxonomy_v0.md`	emergent feature taxonomy
`docs/standalone_layout.md`	shipped product layout
`docs/runtime_and_cli.md`	single runtime and CLI contract
`docs/research_and_generation.md`	how research and generation should work
`hotgraph/`	generated graph and summaries
`runs/`	future generated packets and scorecards

Product Slices

Pareto read:

Slice	Why it is first-class	Real-world analog
graph kernel	holds durable graph state and receipts	engine block
control language	expresses bits, vectors, tensors	instrument cluster
control benchmarks	proves the control layers work	test track
conversational API	one user-facing front door	reception desk
orchestrator UX	one operator surface over the API	cockpit

Layman version: the benchmark, API, and UX are not sidecars. They are part of the shipped machine.

Bootstrap Intake

The intake field is still:

NIX.codecli
tmp-meta3-engine-test
meta3-graph-core
dreaming-kernel
nix-star
causal-workbench
tiny_graph_engine

These are source systems to compare and import from. They are not the final runtime boundary of this repo.

Kernel Stack Target

Layer	Local target	Upstream reference	Why
graph kernel	`vendor/meta3-graph-core`	`meta3-graph-core`	deterministic graph reifier
product runtime	`runtime/`	`causal-workbench`	thin routed manifest runtime
control plane	`policy/`	`nix-star`	ledger-first confidence and policy
benchmark plane	`benchmarks/`	local eval doctrine	prove quality, cost, and stability
API plane	`api/`	conversational front door doctrine	one interface
UX plane	`ux/`	orchestrator surface doctrine	one operator desk

Layman version: import the best engine, keep one dashboard, and mount the gauges inside the same car.

CLI

Use the current CLI like this:

./bin/bvtctl "summarise the current runtime"
./bin/bvtctl ask "run the demo manifest" runtime/examples/demo_manifest.json
./bin/bvtctl chat
./bin/bvtctl context
./bin/bvtctl bootstrap-context
./bin/bvtctl bootstrap
./bin/bvtctl run migration_brief
./bin/bvtctl execute
./bin/bvtctl kernel-plan
./bin/bvtctl benchmark
./bin/bvtctl packet migration_brief
./bin/bvtctl policy
./bin/bvtctl scorecard
./bin/bvtctl runtime
./bin/bvtctl api
./bin/bvtctl ux

Why in plain English: the sentence form is now the default front door; the named commands are the operator shelf behind the desk.

First Local Imports

The first real local slices are now:

policy/control_language_v0.json
policy/runtime_profiles_v0.json
benchmarks/control_scorecard_v0.json
benchmarks/control_scorecard_v0.md
runtime/work_manifest_v0.json
runtime/work_manifest_packet_v0.json
runtime/examples/demo_manifest.json
runtime/examples/demo_exec_manifest.json
api/conversational_api_v0.json
ux/orchestrator_ux_v0.md
vendor/meta3-graph-core/schema/receipt_v1.json

Why: the control language and proof surface are the fastest way to make this repo operational before deeper kernel import.

Working Rule

Pareto frontier:

do not ship a repo of external path pointers
do not invent a new kernel if an off-the-shelf one is already stronger
do not let docs become the only state
do not treat benchmarks, API, or UX as optional afterthoughts

Instead:

compare source systems
choose the strongest slice
import it into the local layout
benchmark it through bits/vectors/tensors
expose it through one API and one UX

Current Read

Best direction today:

meta3-graph-core = graph kernel import target
causal-workbench = product-runtime pattern
nix-star = control-language and policy pattern
NIX.codecli = conversational API/orchestrator UX reference field
local benchmarks/ = mandatory proof surface

That is why this repo exists: to turn the best scattered ideas into one standalone control product.