A newer version of the Gradio SDK is available: 6.19.0
title: Multi-Agent Land
emoji: ๐ฒ
colorFrom: green
colorTo: indigo
sdk: gradio
sdk_version: 6.16.0
python_version: '3.10'
app_file: app.py
pinned: true
tags:
- agent-demo-track
- track:wood
- sponsor:openai
- sponsor:nvidia
- sponsor:modal
- sponsor:openbmb
- achievement:offbrand
- achievement:sharing
- achievement:fieldnotes
Multi-Agent Land
Small models, one shared log, and a clear view of how agents behave in motion.
Most multi-agent systems are hard to inspect: agents call each other directly and the state gets messy. We wanted to see small agents in action โ not isolated prompts, but small models interacting over time: debating, collaborating, playing games, and pushing each other in a shared environment.
So we built one. Every action โ thoughts, tool calls, state updates โ is appended to a single immutable log. When one agent asks, another answers, a judge evaluates, and a keeper tracks progress, nothing is sent agent-to-agent โ every interaction flows through that one shared ledger. So you can follow the whole run, step by step.
Submission
- ๐ฌ Demo video: https://youtu.be/v8-zR6eTbDM
- ๐ฃ Social post: https://www.linkedin.com/posts/gharsallah_huggingface-hackathon-buildsmall-activity-7472383877991501824-8vxO
- ๐ป GitHub link: https://github.com/abducodez/multi-agent-land
- ๐ฅ Team (Hugging Face usernames):
@agharsalah
Tags (track + badges) live in the YAML block at the top of this README โ without them the project can't be placed in a category.
Quickstart
uv sync # create .venv and install everything from the lockfile
# Optional: configure live inference (else the app runs fully offline)
cp .env.example .env # then set MODAL_WORKSPACE
uv run app.py
Don't have uv?
curl -LsSf https://astral.sh/uv/install.sh | sh
The app runs on a deterministic local stub with no API key โ great for testing
and demos that need to be fully reproducible. To go live, deploy the small models
in modal/ and set MODAL_WORKSPACE in .env; every agent then
binds to its model by catalogue key (modal/catalogue.py). There is no generic
cloud key โ live inference is always against models you deploy yourself.
Run it live
By default the app runs fully offline on the deterministic stub. To use real
small-model inference โ Modal-served models, a persistent Neon/Postgres ledger,
and the optional mem0 memory index โ copy .env.example to .env and set the
relevant variables. A live run stays bounded by the Governor and the UI auto-stops
autoplay at budget/verdict, so it won't loop forever.
See docs/runbook-live-mode.md for the step-by-step runbook and the safety story.
Run tests
uv run pytest tests/ -v
What It Is
A tiny theater engine powered by specialist small-model agents. Agents never call each other directly โ they post typed events to a shared append-only ledger, and every view (the stage, the memory, the UI) is a projection derived from that log.
The loop is simple: define the environment and the agent roles, then launch them in the multi-agent lab. Each scenario can run for a long time โ agents debate, collaborate, play games, and push each other โ while a live telemetry view lets you follow the whole run step by step.
What makes it super modular:
- Config, not code. Agents, scenarios, casts, model tiers, tool grants, and
budgets are declarative YAML under
config/, validated by a schema. Add a world by adding files โ proven bytests/test_modularity.py(zero engine edits). - A model per agent. Each agent declares a logical profile (
tiny/fast/balanced/strong); aModelRouterbinds it to a concrete small model served on Modal โ Nemotron, MiniCPM, Gemma. Mix a โค4B worker with a โค32B judge in one cast. - Capability-checked tools. Agents call tools only if their manifest grants them โ the contract that fronts in-process tools today and MCP servers later.
- Built to run for hours. The ledger is the checkpoint:
restore()resumes a killed run; a token-aware governor bounds spend;step(n_ticks=N)maps one wall-clock episode onto N sim-ticks.
The user can Start from any seed, Advance a turn, Drop a disturbance, and Switch scenarios โ all live.
Scenarios (each is a YAML config)
| Name | Cognitive task | Cast (model tiers) |
|---|---|---|
| ๐ Thousand Token Wood | Divergent world-growth | Seedkeeper fast, Critic balanced, Pocket Actor tiny, Echo fast |
| ๐ Mystery Roots | Convergent mystery-solving | Clue Gatherer fast, Hypothesis Former balanced, Devil's Advocate fast, Judge strong |
| ๐ฎ Oracle Grove | Tool-using prophecy | Seedkeeper fast, Fortune-Teller fast + oracle tool |
Adding a fourth scenario is a new YAML file in config/scenarios/. Zero engine edits.
Architecture in 90 seconds
config/ (YAML) โ Registry โ Scenario(cast) + ModelRouter + ToolRegistry
โ
Visitor seed or disturbance
โ
Conductor โ Governor (calls + tokens + spend)
โ
subscription queue + tick schedule โ [Agentโ, Agentโ, ...]
โ
ContextBuilder ModelRouter.for_profile(tiny|fast|balanced|strong)
โ persona โ โ the right small model per agent
โ shared goal โผ
โ scene (projection) inference โ structured JSON event
โ memory (ledger view, windowed or salience-ranked)
โ visitor + granted tools
โ
Typed Event โ Ledger.append() (idempotent; SQLite-backed for long runs)
โ
Projections update โ Observer (read-only) โ Gradio UI (stage + ledger + stats + live config)
The live theater โ the two-tab Fishbowl UI (Lab + Show) built on this read surface โ is documented as-built in docs/architecture/fishbowl-ui.md.
Key decisions (see docs/adr/ for full reasoning)
| # | Decision |
|---|---|
| 0001 | Append-only event ledger as the sole source of truth |
| 0002 | Gradio as the UI layer |
| 0003 | Small specialist agents over one large model |
| 0004 | Document every architectural decision as we build |
| 0005 | Agent memory is a ledger view, not a separate store |
| 0006 | ContextBuilder owns prompt assembly; agents own only persona + action |
| 0007 | Governor is injected into the conductor to enforce call budgets |
| 0008 | Zero engine edits required to add a second scenario |
| 0009 | Event kinds are open + format-validated; authority lives in may_emit |
| 0010 | Per-agent model routing via logical profiles (ModelRouter) |
| 0011 | Declarative, validatable config โ UI/LLM-generatable (WorldConfig) |
| 0012 | Capability-based tool contract (ToolRegistry); MCP-ready |
| 0013 | Token-aware governor + long-running foundations (restore/snapshot/two-clock) |
| 0014 | Small models served on Modal, one OpenAI-compatible app per provider |
Add a world without code
Drop two YAML files into config/ and it appears in the app โ no engine edit.
# config/agents/town-crier.yaml
name: town-crier
persona: You are the Town Crier. Announce one bit of news in a sentence.
may_emit: [crier.announced] # a brand-new namespaced kind, minted by config
schedule: { tick_every: 1 }
model_profile: tiny # routed to a โค4B model
# config/scenarios/town-square.yaml
name: town-square
title: "๐ฃ Town Square"
goal: Keep the square informed.
default_seed: Market day in a town that forgets its own name nightly.
cast: [town-crier] # who participates
A UI form or an LLM can emit the same structure and validate it before running:
validate_world({...}) raises if a cast names an undefined agent. The invariant is
enforced by a test (tests/test_modularity.py). See
docs/architecture/config-system.md.
Repository map
app.py Gradio composition root (loads scenarios from config/)
config/ THE configurable surface (declarative, validatable)
models.yaml Logical profile โ catalogue key (model lives in modal/catalogue.py)
agents/*.yaml One AgentManifest per agent
scenarios/*.yaml One ScenarioConfig per scenario (cast = agent names)
src/
core/
events.py Event schema โ open, namespaced, validated kinds
ledger.py Append-only in-memory ledger
sqlite_ledger.py Persistent ledger (WAL, snapshot, restore, tail)
projections.py Pure-function stage projection (+ generic kind fallback)
conductor.py Two-clock loop, subscription+tick routing, restore/snapshot
memory.py Episodic / salience / reflection โ all ledger views
context.py ContextBuilder โ layered prompt assembly
governor.py Budget guard (calls + tokens + spend)
manifest.py AgentManifest โ the agent contract + resolve_model
config.py ScenarioConfig / ModelsConfig / WorldConfig + validators
registry.py Loads config/, resolves casts, binds handlers
structured.py JSON output instruction + tolerant parser
observer.py Read-only renderer with view diffs
agents/
base.py Agent ABC + ManifestAgent (the workhorse)
handlers.py Behaviour handlers (e.g. FortuneTeller โ calls a tool)
scenarios/
base.py Scenario dataclass (goal, genesis, legacy schedule)
thousand_token_wood.py Thin build_scenario() โ registry
mystery_roots.py Thin build_scenario() โ registry
models/
provider.py ModelProvider ABC + DeterministicTinyModel + usage
openai_compat.py OpenAI-compatible provider + credentials check
router.py ModelRouter โ per-agent profile โ small model
tools/
registry.py ToolRegistry โ capability-checked broker
builtins.py oracle tool + default_tool_registry()
ui/
render.py Gradio rendering helpers + live config panel
tests/ 185 passing tests, zero mocks
docs/
vision.md One-page product and technical vision
architecture/ Overview, model-routing, config-system, tool-contract, fishbowl-ui, โฆ
adr/ Architecture Decision Records (0001โ0013)
schema/ events / agent-manifest / scenario-config / world-config
runbooks/ strategy/ blog/ journal/
scripts/
resume_run.py Resume a long-running scenario from a SQLite ledger
new_journal_entry.py Creates dated build log entries
snapshot_progress.py Updates docs/blog/building-in-public.md from journal
modal/ OpenAI-compatible small-model serving on Modal
service.py Reusable vLLM serving layer (ModelConfig, register_model)
registry.py Declarative model catalogue, grouped by provider
app_*.py One Modal app per provider (nvidia/openbmb/google)
openapi.yaml Checked-in OpenAPI 3.1 spec for the served API
client.py OpenAI-SDK smoke-test client
docs/ Deploy guide, OpenAPI reference, Modal docs mirror
modal_app.py Optional: serverless scheduled run (Modal)
Hackathon targets
- Genuinely delightful โ strange, joyful, worth showing a friend
- AI is load-bearing โ agents generate the evolving scene; the user does not author it
- Small models โ every runtime model โค 32B, with an optional โค 4B Tiny Titan mode
- Polished Gradio โ custom theme, live ledger, visible agent trace, demo-ready seeds
- Prize stacking โ Thousand Token Wood, Community Choice, OpenAI Track, Tiny Titan, Best Agent, Off-Brand UI, Best Demo, Judges' Wildcard
Development loop
# 1. Build the thinnest slice
# 2. Record the decision
uv run python -c "from scripts.new_journal_entry import main; main()" "What changed today"
# 3. Regenerate the living blog
uv run scripts/snapshot_progress.py
# 4. Confirm nothing broke
uv run pytest tests/ -q
Small models, one shared log, and a clear view of how agents behave in motion. This is Multi-Agent Land.