A newer version of the Gradio SDK is available: 6.19.0
title: Labrats — Tiny Lab Agents
emoji: 🧪
colorFrom: indigo
colorTo: gray
sdk: gradio
sdk_version: 6.16.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: Small (≤32B) LLM agents doing science in DiscoveryWorld
tags:
- thousand-token-wood
- agent
- best-agent
- best-demo
- small-models
- multi-agent
- discoveryworld
- track:wood
- achievement:llama
Labrats — Tiny lab agents in DiscoveryWorld
A research lab of small (≤32B) LLM agents doing science inside AllenAI's DiscoveryWorld simulator. Each ReAct avatar has private episodic memory plus a shared lab notebook; N avatars share one world instance and resolve their actions simultaneously.
🔴 Run it live
The "Run live" tab runs a real episode — N ReAct agents on a fresh DiscoveryWorld scenario — entirely on the free CPU tier. Both the LLM and the memory embeddings run remotely via Hugging Face Inference Providers, so there is no torch/sentence-transformers install to time out the build. A single in-flight episode is allowed at a time, with step/agent caps (MAX_STEPS_CAP, MAX_AGENTS_CAP).
A finished live run is written into runs/ and appears in the Episode replay dropdown automatically.
Pinned traces
Two recorded episodes ship with the Space:
stage2-gemma4-12b— single agent, memory on, Instrument Measurement scenario. Measures the spectrum then pivotsPICKUP → PUTand completes the task.dbg-verbose-gemma4— two agents on an Archaeology Dating dig, sharing one lab notebook and talking to each other. Scores 7/10 as they coordinate digging, dating, and flagging the oldest artifact.
The "Compare two runs" tab puts any two side-by-side so the behavioural delta is visible at a glance.
How it works (Phase C summary)
ChromaMemoryStorewith three collections:private(per-author),notebook(shared), andchat(utterances).HFInferenceEmbedder(BAAI/bge-small-en-v1.5, called via Inference Providers) for retrieval; cosine ANN inside Chroma, re-ranked in Python. No localtorch.- Composite score per Generative Agents (UIST 2023):
recency * 0.995^age + importance/10 + relevance. - Rule-based writer: every successful
USEof an instrument →measurementrecord in the notebook; every engine-rejected action →failurerecord in the private tier. - Retrieval (k=3 per tier) is injected into the prompt under
## Lab notebook/## Your notesheaders before each ReAct decision.
Both the LLM and the embeddings run remotely, so the Space deploys cleanly on the free CPU tier. To run the agent locally, see the repo's main README.