Spaces:

build-small-hackathon
/

multi-agent-lab

Sleeping

App Files Files Community

multi-agent-lab / README.md

agharsallah

feat(about): add About tab with project overview, architecture, and links

52b556c 16 days ago

preview code

Raw

History Blame Contribute Delete

12.7 kB

	---
	title: Multi-Agent Land
	emoji: 🌲
	colorFrom: green
	colorTo: indigo
	sdk: gradio
	sdk_version: "6.16.0"
	python_version: "3.10"
	app_file: app.py
	pinned: true
	tags:
	- agent-demo-track
	- track:wood
	- sponsor:openai
	- sponsor:nvidia
	- sponsor:modal
	- sponsor:openbmb
	- achievement:offbrand
	- achievement:sharing
	- achievement:fieldnotes
	---

	# Multi-Agent Land

	Small models, one shared log, and a clear view of how agents behave in motion.

	Most multi-agent systems are hard to inspect: agents call each other directly and the
	state gets messy. We wanted to see small agents in action — not isolated prompts, but
	small models interacting over time: **debating, collaborating, playing games, and
	pushing each other** in a shared environment.

	So we built one. Every action — thoughts, tool calls, state updates — is appended to a
	single immutable log. When one agent asks, another answers, a judge evaluates, and a
	keeper tracks progress, nothing is sent agent-to-agent — every interaction flows
	through that one shared ledger. So you can follow the whole run, step by step.

	---

	## Submission


	- 🎬 Demo video: https://youtu.be/v8-zR6eTbDM
	- 📣 Social post: https://www.linkedin.com/posts/gharsallah_huggingface-hackathon-buildsmall-activity-7472383877991501824-8vxO
	- 💻 GitHub link: https://github.com/abducodez/multi-agent-land
	- 👥 Team (Hugging Face usernames): <!-- TODO: list every teammate's HF username; each must register + join the org separately -->
	- `@agharsalah`

	> Tags (track + badges) live in the YAML block at the top of this README — without
	> them the project can't be placed in a category.

	---

	## Quickstart

	```bash
	uv sync # create .venv and install everything from the lockfile

	# Optional: configure live inference (else the app runs fully offline)
	cp .env.example .env # then set MODAL_WORKSPACE

	uv run app.py
	```

	> Don't have [uv](https://docs.astral.sh/uv/)? `curl -LsSf https://astral.sh/uv/install.sh \| sh`

	The app runs on a deterministic local stub with no API key — great for testing
	and demos that need to be fully reproducible. To go live, deploy the small models
	in [`modal/`](modal/README.md) and set `MODAL_WORKSPACE` in `.env`; every agent then
	binds to its model by catalogue key (`modal/catalogue.py`). There is no generic
	cloud key — live inference is always against models you deploy yourself.

	### Run it live

	By default the app runs fully offline on the deterministic stub. To use real
	small-model inference — Modal-served models, a persistent Neon/Postgres ledger,
	and the optional mem0 memory index — copy `.env.example` to `.env` and set the
	relevant variables. A live run stays bounded by the Governor and the UI auto-stops
	autoplay at budget/verdict, so it won't loop forever.

	See [docs/runbook-live-mode.md](docs/runbook-live-mode.md) for the step-by-step
	runbook and the safety story.

	### Run tests

	```bash
	uv run pytest tests/ -v
	```

	---

	## What It Is

	A tiny theater engine powered by specialist small-model agents. Agents never
	call each other directly — they post typed events to a shared append-only ledger,
	and every view (the stage, the memory, the UI) is a projection derived from that log.

	The loop is simple: define the environment and the agent roles, then launch them in
	the multi-agent lab. Each scenario can run for a long time — agents debate, collaborate,
	play games, and push each other — while a live telemetry view lets you follow the whole
	run step by step.

	What makes it super modular:
	- Config, not code. Agents, scenarios, casts, model tiers, tool grants, and
	budgets are declarative YAML under `config/`, validated by a schema. Add a world
	by adding files — proven by `tests/test_modularity.py` (zero engine edits).
	- A model per agent. Each agent declares a logical profile (`tiny`/`fast`/
	`balanced`/`strong`); a `ModelRouter` binds it to a concrete small model served on
	Modal — Nemotron, MiniCPM, Gemma. Mix a ≤4B worker with a ≤32B judge in one cast.
	- Capability-checked tools. Agents call tools only if their manifest grants
	them — the contract that fronts in-process tools today and MCP servers later.
	- Built to run for hours. The ledger is the checkpoint: `restore()` resumes a
	killed run; a token-aware governor bounds spend; `step(n_ticks=N)` maps one
	wall-clock episode onto N sim-ticks.

	The user can Start from any seed, Advance a turn, Drop a disturbance,
	and Switch scenarios — all live.

	### Scenarios (each is a YAML config)

	\| Name \| Cognitive task \| Cast (model tiers) \|
	\|---\|---\|---\|
	\| 🍄 Thousand Token Wood \| Divergent world-growth \| Seedkeeper `fast`, Critic `balanced`, Pocket Actor `tiny`, Echo `fast` \|
	\| 🔍 Mystery Roots \| Convergent mystery-solving \| Clue Gatherer `fast`, Hypothesis Former `balanced`, Devil's Advocate `fast`, Judge `strong` \|
	\| 🔮 Oracle Grove \| Tool-using prophecy \| Seedkeeper `fast`, Fortune-Teller `fast` + `oracle` tool \|

	Adding a fourth scenario is a new YAML file in `config/scenarios/`. Zero engine edits.

	---

	## Architecture in 90 seconds

	```
	config/ (YAML) → Registry → Scenario(cast) + ModelRouter + ToolRegistry
	│
	Visitor seed or disturbance
	│
	Conductor ← Governor (calls + tokens + spend)
	│
	subscription queue + tick schedule → [Agent₁, Agent₂, ...]
	│
	ContextBuilder ModelRouter.for_profile(tiny\|fast\|balanced\|strong)
	├ persona │ → the right small model per agent
	├ shared goal ▼
	├ scene (projection) inference → structured JSON event
	├ memory (ledger view, windowed or salience-ranked)
	└ visitor + granted tools
	│
	Typed Event → Ledger.append() (idempotent; SQLite-backed for long runs)
	│
	Projections update → Observer (read-only) → Gradio UI (stage + ledger + stats + live config)
	```

	The live theater — the two-tab Fishbowl UI (Lab + Show) built on this read surface —
	is documented as-built in
	[docs/architecture/fishbowl-ui.md](docs/architecture/fishbowl-ui.md).

	### Key decisions (see `docs/adr/` for full reasoning)

	\| # \| Decision \|
	\|---\|---\|
	\| 0001 \| Append-only event ledger as the sole source of truth \|
	\| 0002 \| Gradio as the UI layer \|
	\| 0003 \| Small specialist agents over one large model \|
	\| 0004 \| Document every architectural decision as we build \|
	\| 0005 \| Agent memory is a ledger view, not a separate store \|
	\| 0006 \| `ContextBuilder` owns prompt assembly; agents own only persona + action \|
	\| 0007 \| `Governor` is injected into the conductor to enforce call budgets \|
	\| 0008 \| Zero engine edits required to add a second scenario \|
	\| 0009 \| Event kinds are open + format-validated; authority lives in `may_emit` \|
	\| 0010 \| Per-agent model routing via logical profiles (`ModelRouter`) \|
	\| 0011 \| Declarative, validatable config — UI/LLM-generatable (`WorldConfig`) \|
	\| 0012 \| Capability-based tool contract (`ToolRegistry`); MCP-ready \|
	\| 0013 \| Token-aware governor + long-running foundations (restore/snapshot/two-clock) \|
	\| 0014 \| Small models served on Modal, one OpenAI-compatible app per provider \|

	---

	## Add a world without code

	Drop two YAML files into `config/` and it appears in the app — no engine edit.

	```yaml
	# config/agents/town-crier.yaml
	name: town-crier
	persona: You are the Town Crier. Announce one bit of news in a sentence.
	may_emit: [crier.announced] # a brand-new namespaced kind, minted by config
	schedule: { tick_every: 1 }
	model_profile: tiny # routed to a ≤4B model
	```

	```yaml
	# config/scenarios/town-square.yaml
	name: town-square
	title: "📣 Town Square"
	goal: Keep the square informed.
	default_seed: Market day in a town that forgets its own name nightly.
	cast: [town-crier] # who participates
	```

	A UI form or an LLM can emit the same structure and validate it before running:
	`validate_world({...})` raises if a cast names an undefined agent. The invariant is
	enforced by a test (`tests/test_modularity.py`). See
	[docs/architecture/config-system.md](docs/architecture/config-system.md).

	## Repository map

	```
	app.py Gradio composition root (loads scenarios from config/)
	config/ THE configurable surface (declarative, validatable)
	models.yaml Logical profile → catalogue key (model lives in modal/catalogue.py)
	agents/*.yaml One AgentManifest per agent
	scenarios/*.yaml One ScenarioConfig per scenario (cast = agent names)
	src/
	core/
	events.py Event schema — open, namespaced, validated kinds
	ledger.py Append-only in-memory ledger
	sqlite_ledger.py Persistent ledger (WAL, snapshot, restore, tail)
	projections.py Pure-function stage projection (+ generic kind fallback)
	conductor.py Two-clock loop, subscription+tick routing, restore/snapshot
	memory.py Episodic / salience / reflection — all ledger views
	context.py ContextBuilder — layered prompt assembly
	governor.py Budget guard (calls + tokens + spend)
	manifest.py AgentManifest — the agent contract + resolve_model
	config.py ScenarioConfig / ModelsConfig / WorldConfig + validators
	registry.py Loads config/, resolves casts, binds handlers
	structured.py JSON output instruction + tolerant parser
	observer.py Read-only renderer with view diffs
	agents/
	base.py Agent ABC + ManifestAgent (the workhorse)
	handlers.py Behaviour handlers (e.g. FortuneTeller — calls a tool)
	scenarios/
	base.py Scenario dataclass (goal, genesis, legacy schedule)
	thousand_token_wood.py Thin build_scenario() → registry
	mystery_roots.py Thin build_scenario() → registry
	models/
	provider.py ModelProvider ABC + DeterministicTinyModel + usage
	openai_compat.py OpenAI-compatible provider + credentials check
	router.py ModelRouter — per-agent profile → small model
	tools/
	registry.py ToolRegistry — capability-checked broker
	builtins.py oracle tool + default_tool_registry()
	ui/
	render.py Gradio rendering helpers + live config panel
	tests/ 185 passing tests, zero mocks
	docs/
	vision.md One-page product and technical vision
	architecture/ Overview, model-routing, config-system, tool-contract, fishbowl-ui, …
	adr/ Architecture Decision Records (0001–0013)
	schema/ events / agent-manifest / scenario-config / world-config
	runbooks/ strategy/ blog/ journal/
	scripts/
	resume_run.py Resume a long-running scenario from a SQLite ledger
	new_journal_entry.py Creates dated build log entries
	snapshot_progress.py Updates docs/blog/building-in-public.md from journal
	modal/ OpenAI-compatible small-model serving on Modal
	service.py Reusable vLLM serving layer (ModelConfig, register_model)
	registry.py Declarative model catalogue, grouped by provider
	app_*.py One Modal app per provider (nvidia/openbmb/google)
	openapi.yaml Checked-in OpenAPI 3.1 spec for the served API
	client.py OpenAI-SDK smoke-test client
	docs/ Deploy guide, OpenAPI reference, Modal docs mirror
	modal_app.py Optional: serverless scheduled run (Modal)
	```

	---

	## Hackathon targets

	- Genuinely delightful — strange, joyful, worth showing a friend
	- AI is load-bearing — agents generate the evolving scene; the user does not author it
	- Small models — every runtime model ≤ 32B, with an optional ≤ 4B Tiny Titan mode
	- Polished Gradio — custom theme, live ledger, visible agent trace, demo-ready seeds
	- Prize stacking — Thousand Token Wood, Community Choice, OpenAI Track, Tiny Titan,
	Best Agent, Off-Brand UI, Best Demo, Judges' Wildcard

	---

	## Development loop

	```bash
	# 1. Build the thinnest slice
	# 2. Record the decision
	uv run python -c "from scripts.new_journal_entry import main; main()" "What changed today"
	# 3. Regenerate the living blog
	uv run scripts/snapshot_progress.py
	# 4. Confirm nothing broke
	uv run pytest tests/ -q
	```

	---

	Small models, one shared log, and a clear view of how agents behave in motion.
	This is Multi-Agent Land.