| --- |
| title: Multi-Agent Land |
| emoji: 🌲 |
| colorFrom: green |
| colorTo: indigo |
| sdk: gradio |
| sdk_version: "6.16.0" |
| python_version: "3.10" |
| app_file: app.py |
| pinned: true |
| tags: |
| - agent-demo-track |
| - track:wood |
| - sponsor:openai |
| - sponsor:nvidia |
| - sponsor:modal |
| - sponsor:openbmb |
| - achievement:offbrand |
| - achievement:sharing |
| - achievement:fieldnotes |
| --- |
| |
| # Multi-Agent Land |
|
|
| **Small models, one shared log, and a clear view of how agents behave in motion.** |
|
|
| Most multi-agent systems are hard to inspect: agents call each other directly and the |
| state gets messy. We wanted to *see* small agents in action — not isolated prompts, but |
| small models interacting over time: **debating, collaborating, playing games, and |
| pushing each other** in a shared environment. |
|
|
| So we built one. Every action — thoughts, tool calls, state updates — is appended to a |
| single **immutable log**. When one agent asks, another answers, a judge evaluates, and a |
| keeper tracks progress, *nothing is sent agent-to-agent* — every interaction flows |
| through that one shared ledger. So you can follow the whole run, step by step. |
|
|
| --- |
|
|
| ## Submission |
|
|
|
|
| - **🎬 Demo video:** https://youtu.be/v8-zR6eTbDM |
| - **📣 Social post:** https://www.linkedin.com/posts/gharsallah_huggingface-hackathon-buildsmall-activity-7472383877991501824-8vxO |
| - **💻 GitHub link:** https://github.com/abducodez/multi-agent-land |
| - **👥 Team (Hugging Face usernames):** <!-- TODO: list every teammate's HF username; each must register + join the org separately --> |
| - `@agharsalah` |
| |
| > Tags (track + badges) live in the YAML block at the top of this README — without |
| > them the project can't be placed in a category. |
| |
| --- |
| |
| ## Quickstart |
| |
| ```bash |
| uv sync # create .venv and install everything from the lockfile |
| |
| # Optional: configure live inference (else the app runs fully offline) |
| cp .env.example .env # then set MODAL_WORKSPACE |
|
|
| uv run app.py |
| ``` |
| |
| > Don't have [uv](https://docs.astral.sh/uv/)? `curl -LsSf https://astral.sh/uv/install.sh | sh` |
| |
| The app runs on a **deterministic local stub** with no API key — great for testing |
| and demos that need to be fully reproducible. To go live, deploy the small models |
| in [`modal/`](modal/README.md) and set `MODAL_WORKSPACE` in `.env`; every agent then |
| binds to its model by *catalogue key* (`modal/catalogue.py`). There is no generic |
| cloud key — live inference is always against models you deploy yourself. |
| |
| ### Run it live |
| |
| By default the app runs fully offline on the deterministic stub. To use real |
| small-model inference — Modal-served models, a persistent Neon/Postgres ledger, |
| and the optional mem0 memory index — copy `.env.example` to `.env` and set the |
| relevant variables. A live run stays bounded by the Governor and the UI auto-stops |
| autoplay at budget/verdict, so it won't loop forever. |
| |
| See [docs/runbook-live-mode.md](docs/runbook-live-mode.md) for the step-by-step |
| runbook and the safety story. |
| |
| ### Run tests |
| |
| ```bash |
| uv run pytest tests/ -v |
| ``` |
| |
| --- |
| |
| ## What It Is |
| |
| A **tiny theater engine** powered by specialist small-model agents. Agents never |
| call each other directly — they post typed events to a shared append-only ledger, |
| and every view (the stage, the memory, the UI) is a projection derived from that log. |
| |
| **The loop is simple:** define the environment and the agent roles, then launch them in |
| the multi-agent lab. Each scenario can run for a long time — agents debate, collaborate, |
| play games, and push each other — while a live telemetry view lets you follow the whole |
| run step by step. |
| |
| What makes it *super modular*: |
| - **Config, not code.** Agents, scenarios, casts, model tiers, tool grants, and |
| budgets are declarative YAML under `config/`, validated by a schema. Add a world |
| by adding files — proven by `tests/test_modularity.py` (zero engine edits). |
| - **A model per agent.** Each agent declares a logical profile (`tiny`/`fast`/ |
| `balanced`/`strong`); a `ModelRouter` binds it to a concrete small model served on |
| Modal — **Nemotron, MiniCPM, Gemma**. Mix a ≤4B worker with a ≤32B judge in one cast. |
| - **Capability-checked tools.** Agents call tools only if their manifest grants |
| them — the contract that fronts in-process tools today and MCP servers later. |
| - **Built to run for hours.** The ledger is the checkpoint: `restore()` resumes a |
| killed run; a token-aware governor bounds spend; `step(n_ticks=N)` maps one |
| wall-clock episode onto N sim-ticks. |
| |
| The user can **Start** from any seed, **Advance** a turn, **Drop** a disturbance, |
| and **Switch** scenarios — all live. |
| |
| ### Scenarios (each is a YAML config) |
| |
| | Name | Cognitive task | Cast (model tiers) | |
| |---|---|---| |
| | 🍄 Thousand Token Wood | Divergent world-growth | Seedkeeper `fast`, Critic `balanced`, Pocket Actor `tiny`, Echo `fast` | |
| | 🔍 Mystery Roots | Convergent mystery-solving | Clue Gatherer `fast`, Hypothesis Former `balanced`, Devil's Advocate `fast`, Judge `strong` | |
| | 🔮 Oracle Grove | Tool-using prophecy | Seedkeeper `fast`, Fortune-Teller `fast` + `oracle` tool | |
| |
| Adding a fourth scenario is a new YAML file in `config/scenarios/`. **Zero engine edits.** |
| |
| --- |
| |
| ## Architecture in 90 seconds |
| |
| ``` |
| config/ (YAML) → Registry → Scenario(cast) + ModelRouter + ToolRegistry |
| │ |
| Visitor seed or disturbance |
| │ |
| Conductor ← Governor (calls + tokens + spend) |
| │ |
| subscription queue + tick schedule → [Agent₁, Agent₂, ...] |
| │ |
| ContextBuilder ModelRouter.for_profile(tiny|fast|balanced|strong) |
| ├ persona │ → the right small model per agent |
| ├ shared goal ▼ |
| ├ scene (projection) inference → structured JSON event |
| ├ memory (ledger view, windowed or salience-ranked) |
| └ visitor + granted tools |
| │ |
| Typed Event → Ledger.append() (idempotent; SQLite-backed for long runs) |
| │ |
| Projections update → Observer (read-only) → Gradio UI (stage + ledger + stats + live config) |
| ``` |
| |
| The live theater — the two-tab **Fishbowl** UI (Lab + Show) built on this read surface — |
| is documented as-built in |
| [docs/architecture/fishbowl-ui.md](docs/architecture/fishbowl-ui.md). |
|
|
| ### Key decisions (see `docs/adr/` for full reasoning) |
|
|
| | # | Decision | |
| |---|---| |
| | 0001 | Append-only event ledger as the sole source of truth | |
| | 0002 | Gradio as the UI layer | |
| | 0003 | Small specialist agents over one large model | |
| | 0004 | Document every architectural decision as we build | |
| | 0005 | Agent memory is a ledger view, not a separate store | |
| | 0006 | `ContextBuilder` owns prompt assembly; agents own only persona + action | |
| | 0007 | `Governor` is injected into the conductor to enforce call budgets | |
| | 0008 | Zero engine edits required to add a second scenario | |
| | 0009 | Event kinds are open + format-validated; authority lives in `may_emit` | |
| | 0010 | Per-agent model routing via logical profiles (`ModelRouter`) | |
| | 0011 | Declarative, validatable config — UI/LLM-generatable (`WorldConfig`) | |
| | 0012 | Capability-based tool contract (`ToolRegistry`); MCP-ready | |
| | 0013 | Token-aware governor + long-running foundations (restore/snapshot/two-clock) | |
| | 0014 | Small models served on Modal, one OpenAI-compatible app per provider | |
|
|
| --- |
|
|
| ## Add a world without code |
|
|
| Drop two YAML files into `config/` and it appears in the app — no engine edit. |
|
|
| ```yaml |
| # config/agents/town-crier.yaml |
| name: town-crier |
| persona: You are the Town Crier. Announce one bit of news in a sentence. |
| may_emit: [crier.announced] # a brand-new namespaced kind, minted by config |
| schedule: { tick_every: 1 } |
| model_profile: tiny # routed to a ≤4B model |
| ``` |
|
|
| ```yaml |
| # config/scenarios/town-square.yaml |
| name: town-square |
| title: "📣 Town Square" |
| goal: Keep the square informed. |
| default_seed: Market day in a town that forgets its own name nightly. |
| cast: [town-crier] # who participates |
| ``` |
|
|
| A UI form or an LLM can emit the same structure and validate it before running: |
| `validate_world({...})` raises if a cast names an undefined agent. The invariant is |
| enforced by a test (`tests/test_modularity.py`). See |
| [docs/architecture/config-system.md](docs/architecture/config-system.md). |
|
|
| ## Repository map |
|
|
| ``` |
| app.py Gradio composition root (loads scenarios from config/) |
| config/ THE configurable surface (declarative, validatable) |
| models.yaml Logical profile → catalogue key (model lives in modal/catalogue.py) |
| agents/*.yaml One AgentManifest per agent |
| scenarios/*.yaml One ScenarioConfig per scenario (cast = agent names) |
| src/ |
| core/ |
| events.py Event schema — open, namespaced, validated kinds |
| ledger.py Append-only in-memory ledger |
| sqlite_ledger.py Persistent ledger (WAL, snapshot, restore, tail) |
| projections.py Pure-function stage projection (+ generic kind fallback) |
| conductor.py Two-clock loop, subscription+tick routing, restore/snapshot |
| memory.py Episodic / salience / reflection — all ledger views |
| context.py ContextBuilder — layered prompt assembly |
| governor.py Budget guard (calls + tokens + spend) |
| manifest.py AgentManifest — the agent contract + resolve_model |
| config.py ScenarioConfig / ModelsConfig / WorldConfig + validators |
| registry.py Loads config/, resolves casts, binds handlers |
| structured.py JSON output instruction + tolerant parser |
| observer.py Read-only renderer with view diffs |
| agents/ |
| base.py Agent ABC + ManifestAgent (the workhorse) |
| handlers.py Behaviour handlers (e.g. FortuneTeller — calls a tool) |
| scenarios/ |
| base.py Scenario dataclass (goal, genesis, legacy schedule) |
| thousand_token_wood.py Thin build_scenario() → registry |
| mystery_roots.py Thin build_scenario() → registry |
| models/ |
| provider.py ModelProvider ABC + DeterministicTinyModel + usage |
| openai_compat.py OpenAI-compatible provider + credentials check |
| router.py ModelRouter — per-agent profile → small model |
| tools/ |
| registry.py ToolRegistry — capability-checked broker |
| builtins.py oracle tool + default_tool_registry() |
| ui/ |
| render.py Gradio rendering helpers + live config panel |
| tests/ 185 passing tests, zero mocks |
| docs/ |
| vision.md One-page product and technical vision |
| architecture/ Overview, model-routing, config-system, tool-contract, fishbowl-ui, … |
| adr/ Architecture Decision Records (0001–0013) |
| schema/ events / agent-manifest / scenario-config / world-config |
| runbooks/ strategy/ blog/ journal/ |
| scripts/ |
| resume_run.py Resume a long-running scenario from a SQLite ledger |
| new_journal_entry.py Creates dated build log entries |
| snapshot_progress.py Updates docs/blog/building-in-public.md from journal |
| modal/ OpenAI-compatible small-model serving on Modal |
| service.py Reusable vLLM serving layer (ModelConfig, register_model) |
| registry.py Declarative model catalogue, grouped by provider |
| app_*.py One Modal app per provider (nvidia/openbmb/google) |
| openapi.yaml Checked-in OpenAPI 3.1 spec for the served API |
| client.py OpenAI-SDK smoke-test client |
| docs/ Deploy guide, OpenAPI reference, Modal docs mirror |
| modal_app.py Optional: serverless scheduled run (Modal) |
| ``` |
|
|
| --- |
|
|
| ## Hackathon targets |
|
|
| - **Genuinely delightful** — strange, joyful, worth showing a friend |
| - **AI is load-bearing** — agents generate the evolving scene; the user does not author it |
| - **Small models** — every runtime model ≤ 32B, with an optional ≤ 4B Tiny Titan mode |
| - **Polished Gradio** — custom theme, live ledger, visible agent trace, demo-ready seeds |
| - **Prize stacking** — Thousand Token Wood, Community Choice, OpenAI Track, Tiny Titan, |
| Best Agent, Off-Brand UI, Best Demo, Judges' Wildcard |
|
|
| --- |
|
|
| ## Development loop |
|
|
| ```bash |
| # 1. Build the thinnest slice |
| # 2. Record the decision |
| uv run python -c "from scripts.new_journal_entry import main; main()" "What changed today" |
| # 3. Regenerate the living blog |
| uv run scripts/snapshot_progress.py |
| # 4. Confirm nothing broke |
| uv run pytest tests/ -q |
| ``` |
|
|
| --- |
|
|
| Small models, one shared log, and a clear view of how agents behave in motion. |
| **This is Multi-Agent Land.** |
|
|