Beholder — q4f16 (in-browser / WebLLM)

The MLC q4f16_1 build of the Beholder state-extractor — runs fully in the browser on WebGPU via WebLLM, and against any OpenAI-compatible endpoint.

Beholder reads roleplay / narrative prose and emits structured character state — clothing, colors, materials, held items, and wounds — per body slot, so a paperdoll panel can track what characters are wearing and carrying without the roleplay model losing the thread.

The Beholder extension polls version.json and offers a one-click update when a newer build publishes here.

params_shard_*.bin + tensor-cache*.json — quantized 4-bit weights
Beholder-q4f16-webgpu.wasm — compiled WebGPU model library (WebLLM model_lib)
mlc-chat-config.json — runtime config (browser-right-sized context)
tokenizer.json / tokenizer_config.json
version.json — update manifest

Notes

4-bit is the smallest, browser-friendly build (~430 MB). For higher fidelity on native runtimes, use the 8-bit GGUF → GetBeholder/Beholder-GGUF.
Use near-greedy decoding (low temperature) for stable structured output.

License

PolyForm Noncommercial 1.0.0. Commercial use by permission.

Downloads last month: 127

GetBeholder
/

Beholder-q4f16

Beholder — q4f16 (in-browser / WebLLM)

Contents

Notes

License