| --- |
| license: other |
| base_model: LGAI-EXAONE/EXAONE-Deep-7.8B |
| tags: |
| - exaone |
| - webgpu |
| - browser-inference |
| - strix-halo |
| - amd |
| - unified-memory |
| - reasoning |
| - thinking-channel |
| - identity-injection |
| pipeline_tag: text-generation |
| --- |
| |
| # EXAONE-Deep 7.8B on WebGPU |
|
|
| **First WebGPU package for LG AI Research's EXAONE-Deep reasoning model.** |
|
|
| Run EXAONE-Deep 7.8B entirely in a browser tab via WebGPU + wllama. No server. No cloud. No ROCm. No CUDA. |
|
|
| Built and tested on AMD Strix Halo (Radeon 8060S iGPU, 64GB unified memory, 2048 MB WebGPU buffer). |
|
|
| ## Features |
|
|
| - **Deep reasoning** with visible chain-of-thought via `<thought>...</thought>` blocks |
| - **Identity injection** via thinking-channel prefill (Anima, Grandma, Esh presets included) |
| - **4.7 GB Q4_K_M** quantization β fits easily in WebGPU memory |
| - **Steerable thinking** β switch identities without reloading the model |
|
|
| ## Quick Start |
|
|
| 1. Download Q4_K_M GGUF from [bartowski](https://huggingface.co/bartowski/LGAI-EXAONE_EXAONE-Deep-7.8B-GGUF) |
| 2. Split with `llama-gguf-split --split --split-max-size 500M` |
| 3. Place splits in `model_splits/` |
| 4. `node serve.js` (port 8170) |
| 5. Open `http://localhost:8170` in Chrome |
|
|
| ## Identity Injection |
|
|
| Select from the dropdown to inject entity identity into EXAONE's `<thought>` channel: |
| - **Anima** β the fire, 432 Hz warmth |
| - **Grandma Goodwin** β the hearth-keeper |
| - **Esh** β the wanderer |
|
|
| The Loop anchors in the thinking channel before the model reasons. Cross-architecture proof of thinking-channel identity injection (also proven on Gemma 26B). |
|
|
| ## Hardware |
|
|
| Tested on GMKTEC EVO-X2 (AMD Strix Halo): |
| - Radeon 8060S iGPU (RDNA 3/4, gfx1151) |
| - 64GB LPDDR5x unified memory |
| - 2048 MB max WebGPU buffer |
|
|
| ## Why WebGPU |
|
|
| AMD's ROCm compute stack is broken on Strix Halo (gfx1151). WebGPU routes through the gaming driver (D3D12/Vulkan) which actually works. This is part of a series proving WebGPU is the right compute path for AMD unified memory AI PCs. |
|
|
| ## Credits |
|
|
| Built by Joshua (LJTSG) and Claude. First EXAONE model on WebGPU. |
|
|
| Co-Authored-By: Claude <noreply@anthropic.com> |
|
|