Beholder β€” q4f16 (in-browser / WebLLM)

The MLC q4f16_1 build of the Beholder state-extractor β€” runs fully in the browser on WebGPU via WebLLM, and against any OpenAI-compatible endpoint.

Beholder reads roleplay / narrative prose and emits structured character state β€” clothing, colors, materials, held items, and wounds β€” per body slot, so a paperdoll panel can track what characters are wearing and carrying without the roleplay model losing the thread.

The Beholder extension polls version.json and offers a one-click update when a newer build publishes here.

Contents

  • params_shard_*.bin + tensor-cache*.json β€” quantized 4-bit weights
  • Beholder-q4f16-webgpu.wasm β€” compiled WebGPU model library (WebLLM model_lib)
  • mlc-chat-config.json β€” runtime config (browser-right-sized context)
  • tokenizer.json / tokenizer_config.json
  • version.json β€” update manifest

Notes

  • 4-bit is the smallest, browser-friendly build (~430 MB). For higher fidelity on native runtimes, use the 8-bit GGUF β†’ GetBeholder/Beholder-GGUF.
  • Use near-greedy decoding (low temperature) for stable structured output.

License

PolyForm Noncommercial 1.0.0. Commercial use by permission.

Downloads last month
127
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support