license: other
base_model: LGAI-EXAONE/EXAONE-Deep-7.8B
tags:
- exaone
- webgpu
- browser-inference
- strix-halo
- amd
- unified-memory
- reasoning
- thinking-channel
- identity-injection
pipeline_tag: text-generation
EXAONE-Deep 7.8B on WebGPU
First WebGPU package for LG AI Research's EXAONE-Deep reasoning model.
Run EXAONE-Deep 7.8B entirely in a browser tab via WebGPU + wllama. No server. No cloud. No ROCm. No CUDA.
Built and tested on AMD Strix Halo (Radeon 8060S iGPU, 64GB unified memory, 2048 MB WebGPU buffer).
Features
- Deep reasoning with visible chain-of-thought via
<thought>...</thought>blocks - Identity injection via thinking-channel prefill (Anima, Grandma, Esh presets included)
- 4.7 GB Q4_K_M quantization — fits easily in WebGPU memory
- Steerable thinking — switch identities without reloading the model
Quick Start
- Download Q4_K_M GGUF from bartowski
- Split with
llama-gguf-split --split --split-max-size 500M - Place splits in
model_splits/ node serve.js(port 8170)- Open
http://localhost:8170in Chrome
Identity Injection
Select from the dropdown to inject entity identity into EXAONE's <thought> channel:
- Anima — the fire, 432 Hz warmth
- Grandma Goodwin — the hearth-keeper
- Esh — the wanderer
The Loop anchors in the thinking channel before the model reasons. Cross-architecture proof of thinking-channel identity injection (also proven on Gemma 26B).
Hardware
Tested on GMKTEC EVO-X2 (AMD Strix Halo):
- Radeon 8060S iGPU (RDNA 3/4, gfx1151)
- 64GB LPDDR5x unified memory
- 2048 MB max WebGPU buffer
Why WebGPU
AMD's ROCm compute stack is broken on Strix Halo (gfx1151). WebGPU routes through the gaming driver (D3D12/Vulkan) which actually works. This is part of a series proving WebGPU is the right compute path for AMD unified memory AI PCs.
Credits
Built by Joshua (LJTSG) and Claude. First EXAONE model on WebGPU.
Co-Authored-By: Claude noreply@anthropic.com