Text Generation
GGUF
llama.cpp
MiniCPM
puppet-theater
conversational
How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="build-small-hackathon/AI-Puppet-Theater-MiniCPM5-Actor-GGUF",
	filename="minicpm5-actor-q4_k_m.gguf",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

AI Puppet Theater MiniCPM5 Actor GGUF

Q4_K_M GGUF export of the AI Puppet Theater Actor model.

This model was fine-tuned from openbmb/MiniCPM5-1B using LoRA/QLoRA Actor SFT, merged back into the base model, converted to GGUF, and quantized for llama.cpp.

It is designed to generate one short Actor JSON object for AI Puppet Theater. It is not a general assistant model.

For the training, merge, GGUF conversion, and evaluation story, see the fine-tuning blog linked below.

Intended Use

Generate one Actor response for a single AI Puppet Theater beat. The runtime should pass a premise, show state JSON, actor JSON, and director instruction, then extract and validate the returned JSON before using it.

Expected top-level schema:

{
  "intent": "react_to_event",
  "line": "A short puppet line tied to the scene.",
  "emotion": "curious",
  "gesture": "tilts head toward the stage lights",
  "stage_effect": "spotlight_glow",
  "memory_update": null,
  "tool_request": null
}

tool_request may be null or an object using one of the AI Puppet Theater tool schemas:

  • inspect_prop: {"prop": "..."}
  • consult_stage_oracle: {"question": "..."}
  • change_lighting: {"mood": "..."}

Recommended llama.cpp Usage

Use llama-completion, not interactive llama-cli, for one-shot Actor JSON generation. For the local build used during evaluation, -no-cnv was required to avoid conversation mode.

Recommended settings:

  • binary: llama-completion
  • mode: -no-cnv
  • reasoning: --reasoning off --reasoning-budget 0
  • prompt format: chatml
  • temperature: 0

Example:

llama-completion \
  -m minicpm5-actor-q4_k_m.gguf \
  -no-cnv \
  -n 160 \
  --temp 0 \
  --reasoning off \
  --reasoning-budget 0 \
  -p '<|im_start|>system
You are an Actor agent in AI Puppet Theater. Return only one valid JSON object. No markdown. No commentary. Keep the puppet line short, theatrical, and speakable.
<|im_end|>
<|im_start|>user
premise: A moon mayor denies stealing the town last spoon
show_state JSON: {"story_phase":"complication","latest_prop":"silver spoon","finale_requested":false}
actor JSON: {"name":"Mina Moonbutton","mood":"curious","tools":["inspect_prop"]}
director_instruction: Inspect the latest prop and keep the line short.

Return exactly one JSON object with exactly these keys: intent, line, emotion, gesture, stage_effect, memory_update, tool_request. Do not omit stage_effect.
<|im_end|>
<|im_start|>assistant
'

GGUF Eval Summary

Full GGUF eval used llama-completion with -no-cnv, reasoning disabled, and chatml prompt formatting.

Check Result
Total generations 40
Extracted JSON parse 39/40, 97.5%
Required fields present 39/40, 97.5%
Exact top-level schema 39/40, 97.5%
Sanitized Actor JSON usable 39/40, 97.5%
Strict tool_request valid 35/40, 87.5%
Sanitized tool_request usable 39/40, 97.5%
Line length pass 39/40, 97.5%
Interactive marker seen 0/40, 0.0%
Runtime error 0/40, 0.0%
Timeout 0/40, 0.0%
[Start thinking] seen 0/40, 0.0%

Runtime Notes

Use first-balanced-JSON extraction, Actor JSON schema validation, tool request sanitization, and deterministic fallback in the application runtime. This is important because local LLM outputs can include extra text, malformed tool arguments, or occasional missing fields.

Caveats

  • Demo/hackathon fine-tune.
  • Synthetic Actor SFT data; not broad creative-writing training.
  • Actor-only behavior; Director planning is handled by the app.
  • Not a general chat or instruction model.
  • Validate all tool calls before execution.
  • Best results depend on the documented llama.cpp prompt/runtime settings.

Related Artifacts

Downloads last month
311
GGUF
Model size
1B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for build-small-hackathon/AI-Puppet-Theater-MiniCPM5-Actor-GGUF

Quantized
(40)
this model

Dataset used to train build-small-hackathon/AI-Puppet-Theater-MiniCPM5-Actor-GGUF

Space using build-small-hackathon/AI-Puppet-Theater-MiniCPM5-Actor-GGUF 1

Articles mentioning build-small-hackathon/AI-Puppet-Theater-MiniCPM5-Actor-GGUF