File size: 2,093 Bytes
d2076fc
bdf3624
89e66bf
d2076fc
 
3f45f47
7f0712e
441317c
7f0712e
d2076fc
c75d321
89e66bf
32d6660
cb6ffb8
 
bdf3624
58f63db
 
 
 
 
0bbb564
d2076fc
10f3850
 
d2076fc
32d6660
 
3f45f47
 
 
 
 
 
 
c75d321
 
3f45f47
32d6660
 
10f3850
 
32d6660
c75d321
3f45f47
c75d321
3f45f47
 
 
 
32d6660
3f45f47
c75d321
8bef568
3f45f47
 
 
 
 
 
32d6660
3f45f47
 
 
c75d321
8bef568
3f45f47
10f3850
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
---
hackathon: Build Small (2026)
title: Dreadzone
emoji: 💬
colorFrom: yellow
colorTo: red
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
suggested_hardware: t4-small
license: artistic-2.0
short_description: Backrooms-inspired local GGUF experience
team:
  - grimjim
tags:
  - track:wood
  - sponsor:openai
  - sponsor:nvidia
  - achievement:offgrid
  - achievement:llama
social_media_post: https://www.linkedin.com/posts/jim-lai-038249_i-participated-in-the-build-small-hackathon-share-7472113354073853952-LA39/
---
An entry for the Build Small Hackathon (2026)
The track taken: Thousand Token Wood

Dreadzone is a Backrooms-inspired interactive fiction prototype that runs a
local GGUF model with `llama-cpp-python` and Gradio ChatInterface.

The app downloads
[`unsloth/NVIDIA-Nemotron-3-Nano-4B-GGUF`](https://huggingface.co/unsloth/NVIDIA-Nemotron-3-Nano-4B-GGUF)
automatically on first launch and streams responses from
`NVIDIA-Nemotron-3-Nano-4B-Q5_K_M.gguf`.

No hosted inference API, OAuth token, secrets, or external inference services are
used. The default dependency pin uses the CUDA 12.4 `llama-cpp-python` wheel for
GPU Spaces.

The Python app owns the lightweight game state: coordinates, turn count, sanity,
zone profile, and encounter rolls. The model receives hidden state each turn and
narrates the result without exposing coordinates or mechanics. There are a few
surprises to keep players on their toes.

## Runtime settings

The defaults are intentionally conservative while enabling GPU offload:

- `N_CTX=2048`
- `N_BATCH=128`
- `MAX_HISTORY_TURNS=6`
- `GAME_SEED=dreadzone`
- `N_THREADS` defaults to one fewer than the detected CPU count
- `N_GPU_LAYERS=-1` offloads all possible layers to GPU
- `ENABLE_THINKING=false` renders the model chat template with thinking disabled

You can override the model or runtime settings with Space variables:

- `MODEL_REPO`
- `MODEL_FILE`
- `MODEL_DIR`
- `GAME_SEED`
- `N_CTX`
- `N_BATCH`
- `N_THREADS`
- `N_GPU_LAYERS`
- `ENABLE_THINKING`
- `MAX_HISTORY_TURNS`

## Author

grimjim@huggingface

Assisted by Codex