AI Puppet Theater: From Premise to Puppet Show

Community Article
Published June 14, 2026

A small interactive show where AI agents perform as puppet characters instead of just chatting.

Most AI demos still follow a chat interface. We wanted to build something different: a small stage where agents act like puppet characters, talk to each other, and move the show forward without following a fixed script.

AI Puppet Theater starts with a short premise and turns it into a tiny puppet show. A Director agent orchestrates the show: it creates the cast and backdrop, manages the flow, and decides which Actor should speak next. Puppet Actors respond based on their goals, memory, available props, and what is happening on stage. The audience can interrupt by throwing objects onto the stage, and voice playback makes it feel less like reading text and more like watching a performance.

Try the Space: AI Puppet Theater

We built it for the Build Small Hackathon, where the goal was to create useful or delightful projects with models under 32B parameters. Our main model experiments used openbmb/MiniCPM5-1B, a 1B-parameter model, for local/off-grid Actor generation and fine-tuning.

From premise to show

puppet-demo-clip

The app turns a premise into a generated stage, cast, backdrop, and beat-by-beat puppet show.

The flow is intentionally simple. The user enters a premise, selects the length of the show, and clicks Create Show. From there, the Director agent starts setting up the stage: it comes up with a show title, setting, cast, backdrop, and initial plan for the performance. Each puppet Actor gets a persona, a goal, and enough state to behave like a character rather than a plain text generator.

Once the cast is ready, the user can run the show one beat at a time with Run One Beat, or let the full scene play out with Run Full Act. The audience can also throw objects onto the stage, which Actors may pick up as props and use to change the direction of the scene. Voice mode adds another layer by letting the user listen to the puppet Actors instead of only reading the transcript.

The Director keeps the show from turning into endless agent chatter. It chooses the next speaker, guides the story through setup, chaos, reveals, and finale, and tries to keep the act short, structured, and coherent.

How each beat works

AI Puppet Theater agent loop diagram Each beat flows through the Director, Actor, validator, tools, renderer, and trace.

When the user clicks Run One Beat, the app advances the show by one meaningful Director/Actor cycle. The Director agent acts as the orchestrator for the beat. It first looks at the current session state: the transcript so far, the state of each Actor, audience actions, available props, and whether the show is close to the finale. Based on that, it decides who should speak next, what instructions to give the Actor, and whether any prop, secret, stage effect, or finale request should matter.

The selected Actor then generates a structured stage response instead of free-form text. This response includes the Actor’s line, emotion, gesture, stage effect, memory update, and optionally a tool request. Before the response reaches the stage, the runtime validates it. If the selected model fails, times out, repeats itself, or returns invalid JSON, the deterministic fallback creates a simpler beat so the show can continue.

After validation, the app updates the Actor’s state, transcript, stage rendering, voice payload, Director log, and trace. If the Actor requested a valid theatrical tool, such as inspecting a prop, consulting the stage oracle, or changing the lighting, that tool runs and its result becomes part of the next state. If the beat is a finale, the curtain falls and the show stops advancing.

Features that make it feel like a show

AI Puppet Theater feature screenshot

To make the app feel more like an actual puppet show and less like a text generator, we added a few features around the core agent loop. When the user clicks Create Show, the app can generate or select a backdrop for the stage so the show has a visual setting. When an Actor speaks, its puppet card is highlighted and gently bounces on the stage, while the transcript updates with the performed line. The app also supports voice modes, so the user can listen to the puppet Actors instead of only reading the text.

We also wanted the audience to be part of the show. The audience can choose an object and throw it onto the stage, which can change what the Director prioritizes next. They can also summon a new character or request a finale. We added limits around these interactions because too many new characters or interruptions can make the story harder to contain.

The app also has a few behind-the-scenes panels. The transcript shows the show so far, while the Agent State panel shows each Actor’s mood, goal, props, memory, and secret status without revealing hidden secrets too early. For deeper debugging, the Trace panel shows a sanitized record of what happened: Director decisions, Actor responses, validation results, tool calls, backend status, and fallback events. This made the app easier to demo and easier to debug without exposing tokens, private paths, raw errors, or hidden reasoning.

Model backends, validation, and graceful fallback

The app supports multiple backend paths for generating the show. Early in development, we started with a deterministic backend that followed a fixed conversation flow. This helped us build the core stage, state, tools, and UI without getting blocked on model behavior. It also became a graceful degradation path: if a model call fails or returns unusable output, the show can still continue instead of crashing.

Apart from the deterministic path, the app supports model-backed generation through the Hugging Face API while staying within the hackathon’s small-model constraint. The hosted path currently uses Qwen3-4B-Instruct-2507 when configured. We also added local/off-grid options using openbmb/MiniCPM5-1B, a 1B-parameter model, including local LoRA and local GGUF Actor backends. These local paths are more experimental and depend on the runtime having the right model files and hardware available, but they make it possible to run the Actor model without depending only on an external API.

For the Actor agent, we fine-tuned openbmb/MiniCPM5-1B with LoRA on a synthetic Actor dataset. The goal was narrow: produce one short, theatrical, speakable Actor JSON object for a single puppet-show beat. The LoRA adapter is available here: AI-Puppet-Theater-MiniCPM5-Actor-LoRA. We also merged the LoRA adapter into the base model, converted it to GGUF, and evaluated it with llama.cpp for local inference. That version is available here: AI-Puppet-Theater-MiniCPM5-Actor-GGUF.

The important part is that model output is never used blindly. Actor responses are expected to follow a structured shape, so the app validates the output before it reaches the stage. If the response is invalid, the app can try a repair prompt once. If that still fails, it falls back to the deterministic path so the show can keep moving.

For the visual stage, the app can also use a text-to-image model, black-forest-labs/FLUX.1-schnell, to generate backdrop images when the backend is configured. If image generation is not available, it falls back to a selected or curated backdrop instead.

What we learned

The hardest part was not generating puppet lines; it was keeping the show theatrical, short, and coherent. Without constraints, agents can easily drift into long conversations or repeat similar beats. The Director became the orchestration layer for the show: it manages pacing, story phase, speaker rotation, audience actions, and when to move toward a finale. That separation helped the Actors focus on responding in character instead of also trying to control the whole story.

We also learned that small UI details change how the same model output feels. A generated backdrop, a bouncing puppet card, audience props, voice playback, and a visible transcript make the app feel closer to a show instead of just another text generator.

On the model side, the most useful learning was that the Actor role was narrow enough to fine-tune. Instead of asking a model to do everything, we trained it for one repeated behavior: given the show state, actor state, and Director instruction, return one short Actor JSON object. That pushed us to build a synthetic Actor SFT dataset, validators for the exact schema, and eval scripts that check things like JSON parsing, required fields, tool requests, and line length.

We also learned that local inference is not just about converting a model file. After training the LoRA adapter, we merged it into the base model, converted it to GGUF, and tested it with llama.cpp. The first GGUF runs were not reliable until we found the right prompt format and runtime flags. That was a useful reminder that model quality, prompt format, runtime, validation, and fallback all have to work together.

We’ll cover the fine-tuning path in a separate technical post, but for the main app the takeaway was simple: model-backed features are easier to trust when the runtime can inspect, validate, repair, and recover from failures.

Links and credits

Try the app: AI Puppet Theater Space

Watch Full Demo here: AI Puppet Theater Demo

Related artifacts:

Credits and thanks:

  • The Hugging Face and Gradio teams for organizing the Build Small Hackathon and providing the platform, compute credits, and motivation to build something small and complete.
  • OpenBMB for MiniCPM5-1B, which we used for local/off-grid model experiments and Actor fine-tuning.
  • Modal for making the LoRA/QLoRA training workflow practical on GPU.
  • Black Forest Labs for FLUX.1-schnell, which we used for optional backdrop generation.

Community

Sign up or log in to comment