Spaces:
Runtime error
A newer version of the Gradio SDK is available: 6.19.0
hackathon: Build Small (2026)
title: Dreadzone
emoji: 💬
colorFrom: yellow
colorTo: red
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
suggested_hardware: t4-small
license: artistic-2.0
short_description: Backrooms-inspired local GGUF experience
team:
- grimjim
tags:
- track:wood
- sponsor:openai
- sponsor:nvidia
- achievement:offgrid
- achievement:llama
social_media_post: >-
https://www.linkedin.com/posts/jim-lai-038249_i-participated-in-the-build-small-hackathon-share-7472113354073853952-LA39/
An entry for the Build Small Hackathon (2026) The track taken: Thousand Token Wood
Dreadzone is a Backrooms-inspired interactive fiction prototype that runs a
local GGUF model with llama-cpp-python and Gradio ChatInterface.
The app downloads
unsloth/NVIDIA-Nemotron-3-Nano-4B-GGUF
automatically on first launch and streams responses from
NVIDIA-Nemotron-3-Nano-4B-Q5_K_M.gguf.
No hosted inference API, OAuth token, secrets, or external inference services are
used. The default dependency pin uses the CUDA 12.4 llama-cpp-python wheel for
GPU Spaces.
The Python app owns the lightweight game state: coordinates, turn count, sanity, zone profile, and encounter rolls. The model receives hidden state each turn and narrates the result without exposing coordinates or mechanics. There are a few surprises to keep players on their toes.
Runtime settings
The defaults are intentionally conservative while enabling GPU offload:
N_CTX=2048N_BATCH=128MAX_HISTORY_TURNS=6GAME_SEED=dreadzoneN_THREADSdefaults to one fewer than the detected CPU countN_GPU_LAYERS=-1offloads all possible layers to GPUENABLE_THINKING=falserenders the model chat template with thinking disabled
You can override the model or runtime settings with Space variables:
MODEL_REPOMODEL_FILEMODEL_DIRGAME_SEEDN_CTXN_BATCHN_THREADSN_GPU_LAYERSENABLE_THINKINGMAX_HISTORY_TURNS
Author
grimjim@huggingface
Assisted by Codex