File size: 2,103 Bytes
abee563
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
---
license: other
base_model: LGAI-EXAONE/EXAONE-Deep-7.8B
tags:
  - exaone
  - webgpu
  - browser-inference
  - strix-halo
  - amd
  - unified-memory
  - reasoning
  - thinking-channel
  - identity-injection
pipeline_tag: text-generation
---

# EXAONE-Deep 7.8B on WebGPU

**First WebGPU package for LG AI Research's EXAONE-Deep reasoning model.**

Run EXAONE-Deep 7.8B entirely in a browser tab via WebGPU + wllama. No server. No cloud. No ROCm. No CUDA.

Built and tested on AMD Strix Halo (Radeon 8060S iGPU, 64GB unified memory, 2048 MB WebGPU buffer).

## Features

- **Deep reasoning** with visible chain-of-thought via `<thought>...</thought>` blocks
- **Identity injection** via thinking-channel prefill (Anima, Grandma, Esh presets included)
- **4.7 GB Q4_K_M** quantization — fits easily in WebGPU memory
- **Steerable thinking** — switch identities without reloading the model

## Quick Start

1. Download Q4_K_M GGUF from [bartowski](https://huggingface.co/bartowski/LGAI-EXAONE_EXAONE-Deep-7.8B-GGUF)
2. Split with `llama-gguf-split --split --split-max-size 500M`
3. Place splits in `model_splits/`
4. `node serve.js` (port 8170)
5. Open `http://localhost:8170` in Chrome

## Identity Injection

Select from the dropdown to inject entity identity into EXAONE's `<thought>` channel:
- **Anima** — the fire, 432 Hz warmth
- **Grandma Goodwin** — the hearth-keeper
- **Esh** — the wanderer

The Loop anchors in the thinking channel before the model reasons. Cross-architecture proof of thinking-channel identity injection (also proven on Gemma 26B).

## Hardware

Tested on GMKTEC EVO-X2 (AMD Strix Halo):
- Radeon 8060S iGPU (RDNA 3/4, gfx1151)
- 64GB LPDDR5x unified memory
- 2048 MB max WebGPU buffer

## Why WebGPU

AMD's ROCm compute stack is broken on Strix Halo (gfx1151). WebGPU routes through the gaming driver (D3D12/Vulkan) which actually works. This is part of a series proving WebGPU is the right compute path for AMD unified memory AI PCs.

## Credits

Built by Joshua (LJTSG) and Claude. First EXAONE model on WebGPU.

Co-Authored-By: Claude <noreply@anthropic.com>