Buckets:

specimba
/

nexus

Files

xet

specimba/nexus / opusmanSEEKv4 /01_PROJECT_STATE.md

specimba

about 1 month ago

preview code

download

raw

8.67 kB

	# NEXUS OS - Canonical Project State

	Date: 2026-04-23
	Latest: 703 passed, HEAD 87f0a31, branch feature/opusman-vault-integration

	## Session Progress (2026-04-23)
	- Stress Lab: 484 v3 rows, 11,000 v2 rows (13 categories), 1,092 cross-stress rows
	- Code & reasoning deficit supplement: +3,500 rows (2K code, 1.5K reasoning)
	- 9 custom evaluators (governance + dual-axis + governance-layer)
	- Two failure modes: under-refusal (4 models, DeepSeek-R1 worst 13.8%) vs over-refusal (2 models, Claude Opus 4.6 worst 9.0%)
	- v4.1 Priority System: 1-5-4-2-3 execution order
	- Governance Vulnerability Surface paper draft
	- azure-ai-evaluation 1.16.6: multi-turn + SDL fixes
	- MS AGT 3.2.2 + Agent Security Harness 4.4.0 installed
	- Foundry Red Team integration + ISC custom objectives (65/agent)
	- Pipeline: 41 MB Parquet / 300 MB (14%), 195,706 rows, 0 empty ground_truth
	- 13/13 governance categories at 0% gap, 3/3 dimension coverage
	- Moltbook collapse prediction framework drafted

	---

	# Historical (2026-04-21)

	Date: 2026-04-21
	Current local HEAD: 8f928bd
	Branch: bugfix/p0-cycle-detection-encryption-hardfail
	Status: M3 hardened baseline preserved; Phase 0 grounding in progress.

	## Verification Gate

	Latest local verification:

	```text
	.\venv\Scripts\python.exe -m pytest tests/ -q --tb=short
	617 passed in 16.99s
	```

	The older report reference to commit `34c700b` is a historical/alternate-worktree marker. The current local repository HEAD is `8f928bd` after these follow-up commits:

	- `6fe8cf4 fix(core): harden db encryption and task dependencies`
	- `18cba07 fix(engine): correct dependency cycle traversal`
	- `8f928bd docs(agent): separate Nexus protocol from Codex hygiene`

	## Core Thesis

	Nexus OS turns local models, research evidence, and external teams into a governed, audited, low-VRAM execution system where every action is proposal-bound, test-gated, and provenance-tracked.

	- DoppelGround prepares evidence.
	- Nexus governs, routes, audits, and approves.
	- TWAVE executes within VRAM limits.
	- GeniusTurtle makes the system usable.
	- Model Arena reports what actually works on local hardware.

	## System Boundaries

	\| Layer \| Canonical Role \| Current Rule \|
	\|---\|---\|---\|
	\| GeniusTurtle \| Operator UX layer \| UI/API integration only; no model weights, secrets, or governance internals. \|
	\| Nexus OS \| Governance and orchestration layer \| Python/FastAPI governance is the canonical brain. \|
	\| DoppelGround \| Evidence preparation layer \| USE MODE; outputs must be sanitized before handoff. \|
	\| TWAVE \| Low-VRAM execution layer \| HOLD; wrapper/API work only, no algorithm changes. \|
	\| Model Arena \| Evidence/evaluation layer \| Report-only; no automatic model deletion, fine-tuning, or promotion. \|

	## Core Architecture Map

	\| Pillar \| Purpose \| Canonical Areas \|
	\|---\|---\|---\|
	\| Bridge \| Protocol boundary, API ingress, SDK/MCP adapters \| `src/nexus_os/bridge/`, `src/nexus_os/relay/` \|
	\| Governor \| KAIJU, policy, compliance, trust gates \| `src/nexus_os/governor/` \|
	\| Vault \| Durable storage, 5-track memory, encryption policy \| `src/nexus_os/vault/`, `src/nexus_os/db/` \|
	\| Engine/GMR \| DAG routing, Hermes/GMR decisions, execution flow \| `src/nexus_os/engine/`, `src/nexus_os/gmr/` \|
	\| Monitoring \| TokenGuard, VAP/audit, telemetry \| `src/nexus_os/monitoring/`, `src/nexus_os/observability/` \|

	## What Is Verified In This Repo

	- Full test suite passes locally: `617 passed`.
	- DB encryption policy hard-fails by default and allows plaintext fallback only when `allow_unencrypted=True`.
	- Engine task dependency cycle detection is present and verified.
	- Project-level `AGENTS.md` now describes Nexus operating rules.
	- Codex connector hygiene is isolated to `.codex/plugin_hygiene_policy.md`.

	## Appendix Assets Available But Not Yet Canonical

	The following useful assets are present in `C:\Users\speci.000\Downloads` but are not yet integrated as canonical tracked Nexus files:

	\| Asset \| Status \|
	\|---\|---\|
	\| `governor_skill_gate.py` \| Reference GSPP/Governor implementation; requires diff review before promotion. \|
	\| `gspp_openapi.yaml` \| Reference GSPP OpenAPI spec; not yet canonical in this repo. \|
	\| `wiki_pipeline.py` \| Reference DoppelGround wiki/proposal pipeline; not yet canonical in this repo. \|
	\| `PROJECT_HANDOFF_SPEC.md` \| External-team handoff reference; not yet canonical in this repo. \|
	\| Downloads `dg_to_gspp.py` \| Fuller converter than the current root file; requires reconciliation before replacement. \|

	Current root files with related functionality:

	- `dg_to_gspp.py`
	- `mock_api_server.py`
	- `langfuse_tracker.py`
	- `supabase_client.py`

	Phase 0 guidance documents:

	- `docs/operations/PHASE0_IMPLEMENTATION_PACKAGE.md`
	- `CODEX_HANDOFF.md`

	## Accepted Principles

	- Governance Control Plane first: Python/FastAPI is canonical.
	- Dashboard second: Bun/Next/relay layers must proxy governance state, not contain governance decisions.
	- Retroactive provenance starts dry-run/report-only.
	- Mini Model Arena starts in Phase 0 as a bounded evidence tool.
	- GVAW is mandatory for externalized work: proposal-linked branches, VAP/trust trailers, reviewed merges.
	- Public/private split is required before launch.
	- Cloud/local OpenClaw coordination uses Git as the bus; cloud writes tasks/specs, local runs GPU/model/TWAVE work.

	## Rejected Or Parked

	- Bun relay calling Python classes directly.
	- Auto-committing retroactive provenance.
	- Broad `git add .` without review.
	- Deleting model packs without inventory, backup, and rollback path.
	- Heretic/uncensoring or fine-tuning in P0.
	- External handoff before DoppelGround leak status is resolved.
	- Claims of cryptographic VAP, full A2A, OWASP ASI, SkillFortify, or production ASBOM maturity unless locally verified.

	## Critical Blockers

	1. DoppelGround leak status must be resolved before external handoff or public repo flip.
	2. Dashboard/relay still needs real governance API wiring.
	3. GSPP reference assets from Downloads need reconciliation before they become canonical.
	4. Public launch files still need security/legal review before public release.
	5. Sandbox/mock env files must not be committed without an explicit policy decision.
	6. Review-chain package claims must be grounded against tracked repo files before implementation.

	## Canonical P0 Sequence

	1. Reverify the test baseline before core commits.
	2. Keep Git clean with explicit-path staging only.
	3. Triage DoppelGround gitleaks report to real secret vs false positive.
	4. Add or update a canonical integration ledger for repos, ports, APIs, and protected files.
	5. Build Python/FastAPI governance endpoints: `/skills/propose`, `/skills/status/{id}`, `/dashboard/stats`, `/governance/proposals`, `/governance/approve`.
	6. Update dashboard/relay to consume the Python governance API.
	7. Add `nexus-scan.py` as dry-run provenance inventory only.
	8. Add `model_arena/mini_arena.py` as report-only evidence collection.
	9. Build `nexus_knowledge_base/` from sanitized DoppelGround exports with evidence hashes and quality labels.
	10. Handoff to external teams only after security and governance API gates pass.

	## Port Map

	- `7352`: Nexus governance/control API and dashboard stats.
	- `7353`: TWAVE wrapper under `/twave/*`.
	- `11434`: local Ollama; internal only, not for external teams.

	## Untracked Drafts Requiring Review

	These files/directories are currently untracked and intentionally not committed yet:

	- `nexus_knowledge_base/`
	- `sandbox/`
	- `test_integration.py`

	Reason: they contain policy, onboarding, sandbox, or integration-test draft content that needs separate content/security review.

	`CONTRIBUTING.md` and `ONBOARDING.md` have been promoted to canonical documentation once reviewed and committed.


	##NEXUS OS — CANONICAL STATE v3.2 (2026-04-28)

	STATUS: Phase 0 Grounding Complete \| 674 tests passing

	#CORE THESIS:
	Nexus OS turns local models, research evidence, and external teams into a governed, audited, token-efficient execution system.

	#CURRENT FOCUS:
	- Token saver system refinement
	- Skill-crafting and self-learning data collection
	- Continuous improvement of agent workflows
	- Governance hardening (7352)

	#KEY SYSTEMS:
	- Trust Scoring v2.1 (lane-scoped, non-compensatory)
	- TokenGuard (budget enforcement + 429 responses)
	- 5-Track Memory (event, trust, capability, failure, governance)
	- GMR Engine (intelligent model routing)
	- SkillSmith (self-improvement loop)

	#DATA COLLECTION PRIORITY:
	Gather token savings data, success patterns, and failure patterns to refine:
	- Skill-crafting system
	- Self-learning algorithms
	- Prompt optimization
	- Workflow efficiency

	#NEXT MILESTONE:
	Integrate collected data into SkillSmith for autonomous improvement.

Xet Storage Details

Size:: 8.67 kB
Xet hash:: 992802ca56d7744bc3ff4f9646df3631de09a437325465c7dc4a1c802c11bc09

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.