Spaces:
Running on Zero
A newer version of the Gradio SDK is available: 6.19.0
Roadmap
Reprioritization: a senior-architect review of the current tree β next-steps/architecture-review-and-next-steps.md β argues for a short correctness + observability sprint (real cross-agent visibility, tool/causality events) before resuming the reach-oriented phases below. To add a world, see scenario-authoring.md.
Legend: β done Β· β foundations built, depth remaining Β· β planned
Phase 0: Foundation β
Gradio shell, in-memory append-only ledger, deterministic tiny agents, docs spine (vision, ADR-0001β0004, schema, runbooks), build-journal automation.
Phase 1: Memory + Second Scenario β
EpisodicMemory, ContextBuilder, Governor (call caps), OpenAI-compatible provider, Mystery Roots scenario, two-scenario Gradio UI, ADR-0005β0008.
Phase 2: Reflection + Structured Output β
ReflectionTrackerwired intoManifestAgent.act()β emitsagent.reflected.agent.reflectedis a first-class kind; rendered as a belief in the projection.- JSON constraint block in every agent prompt;
_raw_fallbackrate shown in stats. - All shipped agents converted from
Agentto manifest-driven config. output_extra_fieldsshapes per-scenario payloads.
Phase 3: Persistence + Memory β
- β
SQLiteLedger(WAL, idempotent,snapshot_to,from_file,tail). - β
Conductor.restore()+snapshot_every;scripts/resume_run.py. - β Embedding-based relevance in
SalienceMemory(still keyword overlap). - β pgvector upgrade path for episodic retrieval at scale.
Phase 4: Declarative Config + Tools β (live MCP β)
- β
YAML manifests + scenario configs +
models.yamlunderconfig/. - β
src/core/registry.py: loader, cast resolution, handler binding. - β
WorldConfig+validate_world/agent/scenario(UI/LLM-generatable, ADR-0011). - β
Capability-checked
ToolRegistry+oracletool +oracle-grovescenario (ADR-0012). - β
tests/test_modularity.py: new agent + scenario, zero engine edits. - β Live MCP servers (image-gen, web-fetch) behind the same tool contract.
Phase 5: Long-Running + Durable Execution β
- β
Token-aware Governor (
max_total_tokens,hourly_budget_usd); per-agent token metering (ADR-0013). - β
Two-clock
step(n_ticks=N); ledger-as-checkpoint resume. - β Serverless deploy:
modal_app.py(scheduled run on a persistent volume). - β Wall-clock cron + episode export (
scripts/export_episode.py). - β Temporal / Inngest durable wrapper; OpenTelemetry tracing; cost telemetry hook.
Phase 6: Illustrated Serial (Third Scenario) β
Draftβcritiqueβrevise creative loop on a wall-clock cadence; Artist agent backed by
an image-gen MCP tool; episode gallery. Proves modularity across a third
structurally different scenario. (oracle-grove already proves a tool-using cast.)
Phase 7: Submission Package β
UI polish, frozen demo seed + recorded run, social post, Codex judge rubric pass,
_raw_fallback < 10% with a live model, submit.
The UI-polish track is the Fishbowl redesign β a two-tab gr.HTML theater over the
ledger (Lab + Show, with the say-vs-think MindCard): see
ADR-0021 and
next-steps/fishbowl-ui.md.
What is built right now
The four stable contracts are realized and exercised: open event schema, ledger API (memory + SQLite), declarative agent manifest, capability tool contract. Per-agent small-model routing, declarative validatable config, the modularity-invariant test, and long-running foundations (resume, snapshot, token budget, two-clock) are all in and green.
Architecture stability guarantee
The four contracts are frozen; additions only:
- Event schema (
src/core/events.py) β new kinds are additive by construction. - Ledger API (
src/core/ledger.py) β interface, not implementation. - Agent manifest (
src/core/manifest.py) β backward-compatible additions only. - Tool contract (
src/tools/registry.py) β capability contract, not implementation.
Everything else β scenarios, agents, models, UI, persistence backend, tools β is
hot-swappable via config/.