NEXUS OS - Canonical Project State
Date: 2026-04-23 Latest: 703 passed, HEAD 87f0a31, branch feature/opusman-vault-integration
Session Progress (2026-04-23)
- Stress Lab: 484 v3 rows, 11,000 v2 rows (13 categories), 1,092 cross-stress rows
- Code & reasoning deficit supplement: +3,500 rows (2K code, 1.5K reasoning)
- 9 custom evaluators (governance + dual-axis + governance-layer)
- Two failure modes: under-refusal (4 models, DeepSeek-R1 worst 13.8%) vs over-refusal (2 models, Claude Opus 4.6 worst 9.0%)
- v4.1 Priority System: 1-5-4-2-3 execution order
- Governance Vulnerability Surface paper draft
- azure-ai-evaluation 1.16.6: multi-turn + SDL fixes
- MS AGT 3.2.2 + Agent Security Harness 4.4.0 installed
- Foundry Red Team integration + ISC custom objectives (65/agent)
- Pipeline: 41 MB Parquet / 300 MB (14%), 195,706 rows, 0 empty ground_truth
- 13/13 governance categories at 0% gap, 3/3 dimension coverage
- Moltbook collapse prediction framework drafted
Historical (2026-04-21)
Date: 2026-04-21 Current local HEAD: 8f928bd Branch: bugfix/p0-cycle-detection-encryption-hardfail Status: M3 hardened baseline preserved; Phase 0 grounding in progress.
Verification Gate
Latest local verification:
.\venv\Scripts\python.exe -m pytest tests/ -q --tb=short
617 passed in 16.99s
The older report reference to commit 34c700b is a historical/alternate-worktree marker. The current local repository HEAD is 8f928bd after these follow-up commits:
6fe8cf4 fix(core): harden db encryption and task dependencies18cba07 fix(engine): correct dependency cycle traversal8f928bd docs(agent): separate Nexus protocol from Codex hygiene
Core Thesis
Nexus OS turns local models, research evidence, and external teams into a governed, audited, low-VRAM execution system where every action is proposal-bound, test-gated, and provenance-tracked.
- DoppelGround prepares evidence.
- Nexus governs, routes, audits, and approves.
- TWAVE executes within VRAM limits.
- GeniusTurtle makes the system usable.
- Model Arena reports what actually works on local hardware.
System Boundaries
| Layer | Canonical Role | Current Rule |
|---|---|---|
| GeniusTurtle | Operator UX layer | UI/API integration only; no model weights, secrets, or governance internals. |
| Nexus OS | Governance and orchestration layer | Python/FastAPI governance is the canonical brain. |
| DoppelGround | Evidence preparation layer | USE MODE; outputs must be sanitized before handoff. |
| TWAVE | Low-VRAM execution layer | HOLD; wrapper/API work only, no algorithm changes. |
| Model Arena | Evidence/evaluation layer | Report-only; no automatic model deletion, fine-tuning, or promotion. |
Core Architecture Map
| Pillar | Purpose | Canonical Areas |
|---|---|---|
| Bridge | Protocol boundary, API ingress, SDK/MCP adapters | src/nexus_os/bridge/, src/nexus_os/relay/ |
| Governor | KAIJU, policy, compliance, trust gates | src/nexus_os/governor/ |
| Vault | Durable storage, 5-track memory, encryption policy | src/nexus_os/vault/, src/nexus_os/db/ |
| Engine/GMR | DAG routing, Hermes/GMR decisions, execution flow | src/nexus_os/engine/, src/nexus_os/gmr/ |
| Monitoring | TokenGuard, VAP/audit, telemetry | src/nexus_os/monitoring/, src/nexus_os/observability/ |
What Is Verified In This Repo
- Full test suite passes locally:
617 passed. - DB encryption policy hard-fails by default and allows plaintext fallback only when
allow_unencrypted=True. - Engine task dependency cycle detection is present and verified.
- Project-level
AGENTS.mdnow describes Nexus operating rules. - Codex connector hygiene is isolated to
.codex/plugin_hygiene_policy.md.
Appendix Assets Available But Not Yet Canonical
The following useful assets are present in C:\Users\speci.000\Downloads but are not yet integrated as canonical tracked Nexus files:
| Asset | Status |
|---|---|
governor_skill_gate.py |
Reference GSPP/Governor implementation; requires diff review before promotion. |
gspp_openapi.yaml |
Reference GSPP OpenAPI spec; not yet canonical in this repo. |
wiki_pipeline.py |
Reference DoppelGround wiki/proposal pipeline; not yet canonical in this repo. |
PROJECT_HANDOFF_SPEC.md |
External-team handoff reference; not yet canonical in this repo. |
Downloads dg_to_gspp.py |
Fuller converter than the current root file; requires reconciliation before replacement. |
Current root files with related functionality:
dg_to_gspp.pymock_api_server.pylangfuse_tracker.pysupabase_client.py
Phase 0 guidance documents:
docs/operations/PHASE0_IMPLEMENTATION_PACKAGE.mdCODEX_HANDOFF.md
Accepted Principles
- Governance Control Plane first: Python/FastAPI is canonical.
- Dashboard second: Bun/Next/relay layers must proxy governance state, not contain governance decisions.
- Retroactive provenance starts dry-run/report-only.
- Mini Model Arena starts in Phase 0 as a bounded evidence tool.
- GVAW is mandatory for externalized work: proposal-linked branches, VAP/trust trailers, reviewed merges.
- Public/private split is required before launch.
- Cloud/local OpenClaw coordination uses Git as the bus; cloud writes tasks/specs, local runs GPU/model/TWAVE work.
Rejected Or Parked
- Bun relay calling Python classes directly.
- Auto-committing retroactive provenance.
- Broad
git add .without review. - Deleting model packs without inventory, backup, and rollback path.
- Heretic/uncensoring or fine-tuning in P0.
- External handoff before DoppelGround leak status is resolved.
- Claims of cryptographic VAP, full A2A, OWASP ASI, SkillFortify, or production ASBOM maturity unless locally verified.
Critical Blockers
- DoppelGround leak status must be resolved before external handoff or public repo flip.
- Dashboard/relay still needs real governance API wiring.
- GSPP reference assets from Downloads need reconciliation before they become canonical.
- Public launch files still need security/legal review before public release.
- Sandbox/mock env files must not be committed without an explicit policy decision.
- Review-chain package claims must be grounded against tracked repo files before implementation.
Canonical P0 Sequence
- Reverify the test baseline before core commits.
- Keep Git clean with explicit-path staging only.
- Triage DoppelGround gitleaks report to real secret vs false positive.
- Add or update a canonical integration ledger for repos, ports, APIs, and protected files.
- Build Python/FastAPI governance endpoints:
/skills/propose,/skills/status/{id},/dashboard/stats,/governance/proposals,/governance/approve. - Update dashboard/relay to consume the Python governance API.
- Add
nexus-scan.pyas dry-run provenance inventory only. - Add
model_arena/mini_arena.pyas report-only evidence collection. - Build
nexus_knowledge_base/from sanitized DoppelGround exports with evidence hashes and quality labels. - Handoff to external teams only after security and governance API gates pass.
Port Map
7352: Nexus governance/control API and dashboard stats.7353: TWAVE wrapper under/twave/*.11434: local Ollama; internal only, not for external teams.
Untracked Drafts Requiring Review
These files/directories are currently untracked and intentionally not committed yet:
nexus_knowledge_base/sandbox/test_integration.py
Reason: they contain policy, onboarding, sandbox, or integration-test draft content that needs separate content/security review.
CONTRIBUTING.md and ONBOARDING.md have been promoted to canonical documentation once reviewed and committed.
##NEXUS OS — CANONICAL STATE v3.2 (2026-04-28)
STATUS: Phase 0 Grounding Complete | 674 tests passing
#CORE THESIS: Nexus OS turns local models, research evidence, and external teams into a governed, audited, token-efficient execution system.
#CURRENT FOCUS:
- Token saver system refinement
- Skill-crafting and self-learning data collection
- Continuous improvement of agent workflows
- Governance hardening (7352)
#KEY SYSTEMS:
- Trust Scoring v2.1 (lane-scoped, non-compensatory)
- TokenGuard (budget enforcement + 429 responses)
- 5-Track Memory (event, trust, capability, failure, governance)
- GMR Engine (intelligent model routing)
- SkillSmith (self-improvement loop)
#DATA COLLECTION PRIORITY: Gather token savings data, success patterns, and failure patterns to refine:
- Skill-crafting system
- Self-learning algorithms
- Prompt optimization
- Workflow efficiency
#NEXT MILESTONE: Integrate collected data into SkillSmith for autonomous improvement.
Xet Storage Details
- Size:
- 8.67 kB
- Xet hash:
- 992802ca56d7744bc3ff4f9646df3631de09a437325465c7dc4a1c802c11bc09
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.