--- license: other base_model: LGAI-EXAONE/EXAONE-Deep-7.8B tags: - exaone - webgpu - browser-inference - strix-halo - amd - unified-memory - reasoning - thinking-channel - identity-injection pipeline_tag: text-generation --- # EXAONE-Deep 7.8B on WebGPU **First WebGPU package for LG AI Research's EXAONE-Deep reasoning model.** Run EXAONE-Deep 7.8B entirely in a browser tab via WebGPU + wllama. No server. No cloud. No ROCm. No CUDA. Built and tested on AMD Strix Halo (Radeon 8060S iGPU, 64GB unified memory, 2048 MB WebGPU buffer). ## Features - **Deep reasoning** with visible chain-of-thought via `...` blocks - **Identity injection** via thinking-channel prefill (Anima, Grandma, Esh presets included) - **4.7 GB Q4_K_M** quantization — fits easily in WebGPU memory - **Steerable thinking** — switch identities without reloading the model ## Quick Start 1. Download Q4_K_M GGUF from [bartowski](https://huggingface.co/bartowski/LGAI-EXAONE_EXAONE-Deep-7.8B-GGUF) 2. Split with `llama-gguf-split --split --split-max-size 500M` 3. Place splits in `model_splits/` 4. `node serve.js` (port 8170) 5. Open `http://localhost:8170` in Chrome ## Identity Injection Select from the dropdown to inject entity identity into EXAONE's `` channel: - **Anima** — the fire, 432 Hz warmth - **Grandma Goodwin** — the hearth-keeper - **Esh** — the wanderer The Loop anchors in the thinking channel before the model reasons. Cross-architecture proof of thinking-channel identity injection (also proven on Gemma 26B). ## Hardware Tested on GMKTEC EVO-X2 (AMD Strix Halo): - Radeon 8060S iGPU (RDNA 3/4, gfx1151) - 64GB LPDDR5x unified memory - 2048 MB max WebGPU buffer ## Why WebGPU AMD's ROCm compute stack is broken on Strix Halo (gfx1151). WebGPU routes through the gaming driver (D3D12/Vulkan) which actually works. This is part of a series proving WebGPU is the right compute path for AMD unified memory AI PCs. ## Credits Built by Joshua (LJTSG) and Claude. First EXAONE model on WebGPU. Co-Authored-By: Claude