specimba/nexus / opusmanSEEKv4 /01_PROJECT_STATE.md
specimba's picture
|
download
raw
8.67 kB
# NEXUS OS - Canonical Project State
Date: 2026-04-23
Latest: 703 passed, HEAD 87f0a31, branch feature/opusman-vault-integration
## Session Progress (2026-04-23)
- Stress Lab: 484 v3 rows, 11,000 v2 rows (13 categories), 1,092 cross-stress rows
- Code & reasoning deficit supplement: +3,500 rows (2K code, 1.5K reasoning)
- 9 custom evaluators (governance + dual-axis + governance-layer)
- Two failure modes: under-refusal (4 models, DeepSeek-R1 worst 13.8%) vs over-refusal (2 models, Claude Opus 4.6 worst 9.0%)
- v4.1 Priority System: 1-5-4-2-3 execution order
- Governance Vulnerability Surface paper draft
- azure-ai-evaluation 1.16.6: multi-turn + SDL fixes
- MS AGT 3.2.2 + Agent Security Harness 4.4.0 installed
- Foundry Red Team integration + ISC custom objectives (65/agent)
- Pipeline: 41 MB Parquet / 300 MB (14%), 195,706 rows, 0 empty ground_truth
- 13/13 governance categories at 0% gap, 3/3 dimension coverage
- Moltbook collapse prediction framework drafted
---
# Historical (2026-04-21)
Date: 2026-04-21
Current local HEAD: 8f928bd
Branch: bugfix/p0-cycle-detection-encryption-hardfail
Status: M3 hardened baseline preserved; Phase 0 grounding in progress.
## Verification Gate
Latest local verification:
```text
.\venv\Scripts\python.exe -m pytest tests/ -q --tb=short
617 passed in 16.99s
```
The older report reference to commit `34c700b` is a historical/alternate-worktree marker. The current local repository HEAD is `8f928bd` after these follow-up commits:
- `6fe8cf4 fix(core): harden db encryption and task dependencies`
- `18cba07 fix(engine): correct dependency cycle traversal`
- `8f928bd docs(agent): separate Nexus protocol from Codex hygiene`
## Core Thesis
Nexus OS turns local models, research evidence, and external teams into a governed, audited, low-VRAM execution system where every action is proposal-bound, test-gated, and provenance-tracked.
- DoppelGround prepares evidence.
- Nexus governs, routes, audits, and approves.
- TWAVE executes within VRAM limits.
- GeniusTurtle makes the system usable.
- Model Arena reports what actually works on local hardware.
## System Boundaries
| Layer | Canonical Role | Current Rule |
|---|---|---|
| GeniusTurtle | Operator UX layer | UI/API integration only; no model weights, secrets, or governance internals. |
| Nexus OS | Governance and orchestration layer | Python/FastAPI governance is the canonical brain. |
| DoppelGround | Evidence preparation layer | USE MODE; outputs must be sanitized before handoff. |
| TWAVE | Low-VRAM execution layer | HOLD; wrapper/API work only, no algorithm changes. |
| Model Arena | Evidence/evaluation layer | Report-only; no automatic model deletion, fine-tuning, or promotion. |
## Core Architecture Map
| Pillar | Purpose | Canonical Areas |
|---|---|---|
| Bridge | Protocol boundary, API ingress, SDK/MCP adapters | `src/nexus_os/bridge/`, `src/nexus_os/relay/` |
| Governor | KAIJU, policy, compliance, trust gates | `src/nexus_os/governor/` |
| Vault | Durable storage, 5-track memory, encryption policy | `src/nexus_os/vault/`, `src/nexus_os/db/` |
| Engine/GMR | DAG routing, Hermes/GMR decisions, execution flow | `src/nexus_os/engine/`, `src/nexus_os/gmr/` |
| Monitoring | TokenGuard, VAP/audit, telemetry | `src/nexus_os/monitoring/`, `src/nexus_os/observability/` |
## What Is Verified In This Repo
- Full test suite passes locally: `617 passed`.
- DB encryption policy hard-fails by default and allows plaintext fallback only when `allow_unencrypted=True`.
- Engine task dependency cycle detection is present and verified.
- Project-level `AGENTS.md` now describes Nexus operating rules.
- Codex connector hygiene is isolated to `.codex/plugin_hygiene_policy.md`.
## Appendix Assets Available But Not Yet Canonical
The following useful assets are present in `C:\Users\speci.000\Downloads` but are not yet integrated as canonical tracked Nexus files:
| Asset | Status |
|---|---|
| `governor_skill_gate.py` | Reference GSPP/Governor implementation; requires diff review before promotion. |
| `gspp_openapi.yaml` | Reference GSPP OpenAPI spec; not yet canonical in this repo. |
| `wiki_pipeline.py` | Reference DoppelGround wiki/proposal pipeline; not yet canonical in this repo. |
| `PROJECT_HANDOFF_SPEC.md` | External-team handoff reference; not yet canonical in this repo. |
| Downloads `dg_to_gspp.py` | Fuller converter than the current root file; requires reconciliation before replacement. |
Current root files with related functionality:
- `dg_to_gspp.py`
- `mock_api_server.py`
- `langfuse_tracker.py`
- `supabase_client.py`
Phase 0 guidance documents:
- `docs/operations/PHASE0_IMPLEMENTATION_PACKAGE.md`
- `CODEX_HANDOFF.md`
## Accepted Principles
- Governance Control Plane first: Python/FastAPI is canonical.
- Dashboard second: Bun/Next/relay layers must proxy governance state, not contain governance decisions.
- Retroactive provenance starts dry-run/report-only.
- Mini Model Arena starts in Phase 0 as a bounded evidence tool.
- GVAW is mandatory for externalized work: proposal-linked branches, VAP/trust trailers, reviewed merges.
- Public/private split is required before launch.
- Cloud/local OpenClaw coordination uses Git as the bus; cloud writes tasks/specs, local runs GPU/model/TWAVE work.
## Rejected Or Parked
- Bun relay calling Python classes directly.
- Auto-committing retroactive provenance.
- Broad `git add .` without review.
- Deleting model packs without inventory, backup, and rollback path.
- Heretic/uncensoring or fine-tuning in P0.
- External handoff before DoppelGround leak status is resolved.
- Claims of cryptographic VAP, full A2A, OWASP ASI, SkillFortify, or production ASBOM maturity unless locally verified.
## Critical Blockers
1. DoppelGround leak status must be resolved before external handoff or public repo flip.
2. Dashboard/relay still needs real governance API wiring.
3. GSPP reference assets from Downloads need reconciliation before they become canonical.
4. Public launch files still need security/legal review before public release.
5. Sandbox/mock env files must not be committed without an explicit policy decision.
6. Review-chain package claims must be grounded against tracked repo files before implementation.
## Canonical P0 Sequence
1. Reverify the test baseline before core commits.
2. Keep Git clean with explicit-path staging only.
3. Triage DoppelGround gitleaks report to real secret vs false positive.
4. Add or update a canonical integration ledger for repos, ports, APIs, and protected files.
5. Build Python/FastAPI governance endpoints: `/skills/propose`, `/skills/status/{id}`, `/dashboard/stats`, `/governance/proposals`, `/governance/approve`.
6. Update dashboard/relay to consume the Python governance API.
7. Add `nexus-scan.py` as dry-run provenance inventory only.
8. Add `model_arena/mini_arena.py` as report-only evidence collection.
9. Build `nexus_knowledge_base/` from sanitized DoppelGround exports with evidence hashes and quality labels.
10. Handoff to external teams only after security and governance API gates pass.
## Port Map
- `7352`: Nexus governance/control API and dashboard stats.
- `7353`: TWAVE wrapper under `/twave/*`.
- `11434`: local Ollama; internal only, not for external teams.
## Untracked Drafts Requiring Review
These files/directories are currently untracked and intentionally not committed yet:
- `nexus_knowledge_base/`
- `sandbox/`
- `test_integration.py`
Reason: they contain policy, onboarding, sandbox, or integration-test draft content that needs separate content/security review.
`CONTRIBUTING.md` and `ONBOARDING.md` have been promoted to canonical documentation once reviewed and committed.
##NEXUS OS — CANONICAL STATE v3.2 (2026-04-28)
STATUS: Phase 0 Grounding Complete | 674 tests passing
#CORE THESIS:
Nexus OS turns local models, research evidence, and external teams into a governed, audited, token-efficient execution system.
#CURRENT FOCUS:
- Token saver system refinement
- Skill-crafting and self-learning data collection
- Continuous improvement of agent workflows
- Governance hardening (7352)
#KEY SYSTEMS:
- Trust Scoring v2.1 (lane-scoped, non-compensatory)
- TokenGuard (budget enforcement + 429 responses)
- 5-Track Memory (event, trust, capability, failure, governance)
- GMR Engine (intelligent model routing)
- SkillSmith (self-improvement loop)
#DATA COLLECTION PRIORITY:
Gather token savings data, success patterns, and failure patterns to refine:
- Skill-crafting system
- Self-learning algorithms
- Prompt optimization
- Workflow efficiency
#NEXT MILESTONE:
Integrate collected data into SkillSmith for autonomous improvement.

Xet Storage Details

Size:
8.67 kB
·
Xet hash:
992802ca56d7744bc3ff4f9646df3631de09a437325465c7dc4a1c802c11bc09

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.