Instructions to use GetBeholder/Beholder-q4f16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLC-LLM
How to use GetBeholder/Beholder-q4f16 with MLC-LLM:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Beholder β q4f16 (in-browser / WebLLM)
The MLC q4f16_1 build of the Beholder state-extractor β runs fully in the browser on
WebGPU via WebLLM, and against any OpenAI-compatible endpoint.
Beholder reads roleplay / narrative prose and emits structured character state β clothing, colors, materials, held items, and wounds β per body slot, so a paperdoll panel can track what characters are wearing and carrying without the roleplay model losing the thread.
The Beholder extension polls version.json and offers a one-click update when a newer build publishes here.
Contents
params_shard_*.bin+tensor-cache*.jsonβ quantized 4-bit weightsBeholder-q4f16-webgpu.wasmβ compiled WebGPU model library (WebLLMmodel_lib)mlc-chat-config.jsonβ runtime config (browser-right-sized context)tokenizer.json/tokenizer_config.jsonversion.jsonβ update manifest
Notes
- 4-bit is the smallest, browser-friendly build (~430 MB). For higher fidelity on native runtimes, use the 8-bit GGUF β GetBeholder/Beholder-GGUF.
- Use near-greedy decoding (low temperature) for stable structured output.
License
PolyForm Noncommercial 1.0.0. Commercial use by permission.
- Downloads last month
- 127