Spaces:

Smestern
/

labrats

Running

App Files Files Community

labrats / README.md

Smestern

Deploy Labrats live Space (free-tier, remote embeddings)

657d2d2 verified 18 days ago

preview code

Raw

History Blame Contribute Delete

2.82 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

metadata

title: Labrats — Tiny Lab Agents
emoji: 🧪
colorFrom: indigo
colorTo: gray
sdk: gradio
sdk_version: 6.16.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: Small (≤32B) LLM agents doing science in DiscoveryWorld
tags:
  - thousand-token-wood
  - agent
  - best-agent
  - best-demo
  - small-models
  - multi-agent
  - discoveryworld
  - track:wood
  - achievement:llama

Labrats — Tiny lab agents in DiscoveryWorld

A research lab of small (≤32B) LLM agents doing science inside AllenAI's DiscoveryWorld simulator. Each ReAct avatar has private episodic memory plus a shared lab notebook; N avatars share one world instance and resolve their actions simultaneously.

🔴 Run it live

The "Run live" tab runs a real episode — N ReAct agents on a fresh DiscoveryWorld scenario — entirely on the free CPU tier. Both the LLM and the memory embeddings run remotely via Hugging Face Inference Providers, so there is no torch/sentence-transformers install to time out the build. A single in-flight episode is allowed at a time, with step/agent caps (MAX_STEPS_CAP, MAX_AGENTS_CAP).

A finished live run is written into runs/ and appears in the Episode replay dropdown automatically.

Pinned traces

Two recorded episodes ship with the Space:

stage2-gemma4-12b — single agent, memory on, Instrument Measurement scenario. Measures the spectrum then pivots PICKUP → PUT and completes the task.
dbg-verbose-gemma4 — two agents on an Archaeology Dating dig, sharing one lab notebook and talking to each other. Scores 7/10 as they coordinate digging, dating, and flagging the oldest artifact.

The "Compare two runs" tab puts any two side-by-side so the behavioural delta is visible at a glance.

How it works (Phase C summary)

ChromaMemoryStore with three collections: private (per-author), notebook (shared), and chat (utterances).
HFInferenceEmbedder (BAAI/bge-small-en-v1.5, called via Inference Providers) for retrieval; cosine ANN inside Chroma, re-ranked in Python. No local torch.
Composite score per Generative Agents (UIST 2023): recency * 0.995^age + importance/10 + relevance.
Rule-based writer: every successful USE of an instrument → measurement record in the notebook; every engine-rejected action → failure record in the private tier.
Retrieval (k=3 per tier) is injected into the prompt under ## Lab notebook / ## Your notes headers before each ReAct decision.

Both the LLM and the embeddings run remotely, so the Space deploys cleanly on the free CPU tier. To run the agent locally, see the repo's main README.