--- title: Multi-Agent Land emoji: ๐ŸŒฒ colorFrom: green colorTo: indigo sdk: gradio sdk_version: "6.16.0" python_version: "3.10" app_file: app.py pinned: true tags: - agent-demo-track - track:wood - sponsor:openai - sponsor:nvidia - sponsor:modal - sponsor:openbmb - achievement:offbrand - achievement:sharing - achievement:fieldnotes --- # Multi-Agent Land **Small models, one shared log, and a clear view of how agents behave in motion.** Most multi-agent systems are hard to inspect: agents call each other directly and the state gets messy. We wanted to *see* small agents in action โ€” not isolated prompts, but small models interacting over time: **debating, collaborating, playing games, and pushing each other** in a shared environment. So we built one. Every action โ€” thoughts, tool calls, state updates โ€” is appended to a single **immutable log**. When one agent asks, another answers, a judge evaluates, and a keeper tracks progress, *nothing is sent agent-to-agent* โ€” every interaction flows through that one shared ledger. So you can follow the whole run, step by step. --- ## Submission - **๐ŸŽฌ Demo video:** https://youtu.be/v8-zR6eTbDM - **๐Ÿ“ฃ Social post:** https://www.linkedin.com/posts/gharsallah_huggingface-hackathon-buildsmall-activity-7472383877991501824-8vxO - **๐Ÿ’ป GitHub link:** https://github.com/abducodez/multi-agent-land - **๐Ÿ‘ฅ Team (Hugging Face usernames):** - `@agharsalah` > Tags (track + badges) live in the YAML block at the top of this README โ€” without > them the project can't be placed in a category. --- ## Quickstart ```bash uv sync # create .venv and install everything from the lockfile # Optional: configure live inference (else the app runs fully offline) cp .env.example .env # then set MODAL_WORKSPACE uv run app.py ``` > Don't have [uv](https://docs.astral.sh/uv/)? `curl -LsSf https://astral.sh/uv/install.sh | sh` The app runs on a **deterministic local stub** with no API key โ€” great for testing and demos that need to be fully reproducible. To go live, deploy the small models in [`modal/`](modal/README.md) and set `MODAL_WORKSPACE` in `.env`; every agent then binds to its model by *catalogue key* (`modal/catalogue.py`). There is no generic cloud key โ€” live inference is always against models you deploy yourself. ### Run it live By default the app runs fully offline on the deterministic stub. To use real small-model inference โ€” Modal-served models, a persistent Neon/Postgres ledger, and the optional mem0 memory index โ€” copy `.env.example` to `.env` and set the relevant variables. A live run stays bounded by the Governor and the UI auto-stops autoplay at budget/verdict, so it won't loop forever. See [docs/runbook-live-mode.md](docs/runbook-live-mode.md) for the step-by-step runbook and the safety story. ### Run tests ```bash uv run pytest tests/ -v ``` --- ## What It Is A **tiny theater engine** powered by specialist small-model agents. Agents never call each other directly โ€” they post typed events to a shared append-only ledger, and every view (the stage, the memory, the UI) is a projection derived from that log. **The loop is simple:** define the environment and the agent roles, then launch them in the multi-agent lab. Each scenario can run for a long time โ€” agents debate, collaborate, play games, and push each other โ€” while a live telemetry view lets you follow the whole run step by step. What makes it *super modular*: - **Config, not code.** Agents, scenarios, casts, model tiers, tool grants, and budgets are declarative YAML under `config/`, validated by a schema. Add a world by adding files โ€” proven by `tests/test_modularity.py` (zero engine edits). - **A model per agent.** Each agent declares a logical profile (`tiny`/`fast`/ `balanced`/`strong`); a `ModelRouter` binds it to a concrete small model served on Modal โ€” **Nemotron, MiniCPM, Gemma**. Mix a โ‰ค4B worker with a โ‰ค32B judge in one cast. - **Capability-checked tools.** Agents call tools only if their manifest grants them โ€” the contract that fronts in-process tools today and MCP servers later. - **Built to run for hours.** The ledger is the checkpoint: `restore()` resumes a killed run; a token-aware governor bounds spend; `step(n_ticks=N)` maps one wall-clock episode onto N sim-ticks. The user can **Start** from any seed, **Advance** a turn, **Drop** a disturbance, and **Switch** scenarios โ€” all live. ### Scenarios (each is a YAML config) | Name | Cognitive task | Cast (model tiers) | |---|---|---| | ๐Ÿ„ Thousand Token Wood | Divergent world-growth | Seedkeeper `fast`, Critic `balanced`, Pocket Actor `tiny`, Echo `fast` | | ๐Ÿ” Mystery Roots | Convergent mystery-solving | Clue Gatherer `fast`, Hypothesis Former `balanced`, Devil's Advocate `fast`, Judge `strong` | | ๐Ÿ”ฎ Oracle Grove | Tool-using prophecy | Seedkeeper `fast`, Fortune-Teller `fast` + `oracle` tool | Adding a fourth scenario is a new YAML file in `config/scenarios/`. **Zero engine edits.** --- ## Architecture in 90 seconds ``` config/ (YAML) โ†’ Registry โ†’ Scenario(cast) + ModelRouter + ToolRegistry โ”‚ Visitor seed or disturbance โ”‚ Conductor โ† Governor (calls + tokens + spend) โ”‚ subscription queue + tick schedule โ†’ [Agentโ‚, Agentโ‚‚, ...] โ”‚ ContextBuilder ModelRouter.for_profile(tiny|fast|balanced|strong) โ”œ persona โ”‚ โ†’ the right small model per agent โ”œ shared goal โ–ผ โ”œ scene (projection) inference โ†’ structured JSON event โ”œ memory (ledger view, windowed or salience-ranked) โ”” visitor + granted tools โ”‚ Typed Event โ†’ Ledger.append() (idempotent; SQLite-backed for long runs) โ”‚ Projections update โ†’ Observer (read-only) โ†’ Gradio UI (stage + ledger + stats + live config) ``` The live theater โ€” the two-tab **Fishbowl** UI (Lab + Show) built on this read surface โ€” is documented as-built in [docs/architecture/fishbowl-ui.md](docs/architecture/fishbowl-ui.md). ### Key decisions (see `docs/adr/` for full reasoning) | # | Decision | |---|---| | 0001 | Append-only event ledger as the sole source of truth | | 0002 | Gradio as the UI layer | | 0003 | Small specialist agents over one large model | | 0004 | Document every architectural decision as we build | | 0005 | Agent memory is a ledger view, not a separate store | | 0006 | `ContextBuilder` owns prompt assembly; agents own only persona + action | | 0007 | `Governor` is injected into the conductor to enforce call budgets | | 0008 | Zero engine edits required to add a second scenario | | 0009 | Event kinds are open + format-validated; authority lives in `may_emit` | | 0010 | Per-agent model routing via logical profiles (`ModelRouter`) | | 0011 | Declarative, validatable config โ€” UI/LLM-generatable (`WorldConfig`) | | 0012 | Capability-based tool contract (`ToolRegistry`); MCP-ready | | 0013 | Token-aware governor + long-running foundations (restore/snapshot/two-clock) | | 0014 | Small models served on Modal, one OpenAI-compatible app per provider | --- ## Add a world without code Drop two YAML files into `config/` and it appears in the app โ€” no engine edit. ```yaml # config/agents/town-crier.yaml name: town-crier persona: You are the Town Crier. Announce one bit of news in a sentence. may_emit: [crier.announced] # a brand-new namespaced kind, minted by config schedule: { tick_every: 1 } model_profile: tiny # routed to a โ‰ค4B model ``` ```yaml # config/scenarios/town-square.yaml name: town-square title: "๐Ÿ“ฃ Town Square" goal: Keep the square informed. default_seed: Market day in a town that forgets its own name nightly. cast: [town-crier] # who participates ``` A UI form or an LLM can emit the same structure and validate it before running: `validate_world({...})` raises if a cast names an undefined agent. The invariant is enforced by a test (`tests/test_modularity.py`). See [docs/architecture/config-system.md](docs/architecture/config-system.md). ## Repository map ``` app.py Gradio composition root (loads scenarios from config/) config/ THE configurable surface (declarative, validatable) models.yaml Logical profile โ†’ catalogue key (model lives in modal/catalogue.py) agents/*.yaml One AgentManifest per agent scenarios/*.yaml One ScenarioConfig per scenario (cast = agent names) src/ core/ events.py Event schema โ€” open, namespaced, validated kinds ledger.py Append-only in-memory ledger sqlite_ledger.py Persistent ledger (WAL, snapshot, restore, tail) projections.py Pure-function stage projection (+ generic kind fallback) conductor.py Two-clock loop, subscription+tick routing, restore/snapshot memory.py Episodic / salience / reflection โ€” all ledger views context.py ContextBuilder โ€” layered prompt assembly governor.py Budget guard (calls + tokens + spend) manifest.py AgentManifest โ€” the agent contract + resolve_model config.py ScenarioConfig / ModelsConfig / WorldConfig + validators registry.py Loads config/, resolves casts, binds handlers structured.py JSON output instruction + tolerant parser observer.py Read-only renderer with view diffs agents/ base.py Agent ABC + ManifestAgent (the workhorse) handlers.py Behaviour handlers (e.g. FortuneTeller โ€” calls a tool) scenarios/ base.py Scenario dataclass (goal, genesis, legacy schedule) thousand_token_wood.py Thin build_scenario() โ†’ registry mystery_roots.py Thin build_scenario() โ†’ registry models/ provider.py ModelProvider ABC + DeterministicTinyModel + usage openai_compat.py OpenAI-compatible provider + credentials check router.py ModelRouter โ€” per-agent profile โ†’ small model tools/ registry.py ToolRegistry โ€” capability-checked broker builtins.py oracle tool + default_tool_registry() ui/ render.py Gradio rendering helpers + live config panel tests/ 185 passing tests, zero mocks docs/ vision.md One-page product and technical vision architecture/ Overview, model-routing, config-system, tool-contract, fishbowl-ui, โ€ฆ adr/ Architecture Decision Records (0001โ€“0013) schema/ events / agent-manifest / scenario-config / world-config runbooks/ strategy/ blog/ journal/ scripts/ resume_run.py Resume a long-running scenario from a SQLite ledger new_journal_entry.py Creates dated build log entries snapshot_progress.py Updates docs/blog/building-in-public.md from journal modal/ OpenAI-compatible small-model serving on Modal service.py Reusable vLLM serving layer (ModelConfig, register_model) registry.py Declarative model catalogue, grouped by provider app_*.py One Modal app per provider (nvidia/openbmb/google) openapi.yaml Checked-in OpenAPI 3.1 spec for the served API client.py OpenAI-SDK smoke-test client docs/ Deploy guide, OpenAPI reference, Modal docs mirror modal_app.py Optional: serverless scheduled run (Modal) ``` --- ## Hackathon targets - **Genuinely delightful** โ€” strange, joyful, worth showing a friend - **AI is load-bearing** โ€” agents generate the evolving scene; the user does not author it - **Small models** โ€” every runtime model โ‰ค 32B, with an optional โ‰ค 4B Tiny Titan mode - **Polished Gradio** โ€” custom theme, live ledger, visible agent trace, demo-ready seeds - **Prize stacking** โ€” Thousand Token Wood, Community Choice, OpenAI Track, Tiny Titan, Best Agent, Off-Brand UI, Best Demo, Judges' Wildcard --- ## Development loop ```bash # 1. Build the thinnest slice # 2. Record the decision uv run python -c "from scripts.new_journal_entry import main; main()" "What changed today" # 3. Regenerate the living blog uv run scripts/snapshot_progress.py # 4. Confirm nothing broke uv run pytest tests/ -q ``` --- Small models, one shared log, and a clear view of how agents behave in motion. **This is Multi-Agent Land.**