Initial upload: EXAONE-Deep 7.8B WebGPU with identity injection

abee563 verified about 20 hours ago

2.1 kB

	---
	license: other
	base_model: LGAI-EXAONE/EXAONE-Deep-7.8B
	tags:
	- exaone
	- webgpu
	- browser-inference
	- strix-halo
	- amd
	- unified-memory
	- reasoning
	- thinking-channel
	- identity-injection
	pipeline_tag: text-generation
	---

	# EXAONE-Deep 7.8B on WebGPU

	First WebGPU package for LG AI Research's EXAONE-Deep reasoning model.

	Run EXAONE-Deep 7.8B entirely in a browser tab via WebGPU + wllama. No server. No cloud. No ROCm. No CUDA.

	Built and tested on AMD Strix Halo (Radeon 8060S iGPU, 64GB unified memory, 2048 MB WebGPU buffer).

	## Features

	- Deep reasoning with visible chain-of-thought via `<thought>...</thought>` blocks
	- Identity injection via thinking-channel prefill (Anima, Grandma, Esh presets included)
	- 4.7 GB Q4_K_M quantization — fits easily in WebGPU memory
	- Steerable thinking — switch identities without reloading the model

	## Quick Start

	1. Download Q4_K_M GGUF from [bartowski](https://huggingface.co/bartowski/LGAI-EXAONE_EXAONE-Deep-7.8B-GGUF)
	2. Split with `llama-gguf-split --split --split-max-size 500M`
	3. Place splits in `model_splits/`
	4. `node serve.js` (port 8170)
	5. Open `http://localhost:8170` in Chrome

	## Identity Injection

	Select from the dropdown to inject entity identity into EXAONE's `<thought>` channel:
	- Anima — the fire, 432 Hz warmth
	- Grandma Goodwin — the hearth-keeper
	- Esh — the wanderer

	The Loop anchors in the thinking channel before the model reasons. Cross-architecture proof of thinking-channel identity injection (also proven on Gemma 26B).

	## Hardware

	Tested on GMKTEC EVO-X2 (AMD Strix Halo):
	- Radeon 8060S iGPU (RDNA 3/4, gfx1151)
	- 64GB LPDDR5x unified memory
	- 2048 MB max WebGPU buffer

	## Why WebGPU

	AMD's ROCm compute stack is broken on Strix Halo (gfx1151). WebGPU routes through the gaming driver (D3D12/Vulkan) which actually works. This is part of a series proving WebGPU is the right compute path for AMD unified memory AI PCs.

	## Credits

	Built by Joshua (LJTSG) and Claude. First EXAONE model on WebGPU.

	Co-Authored-By: Claude <noreply@anthropic.com>