LJTSG
/

SmolLM2-360M-webgpu

Text Generation

browser-inference

Model card Files Files and versions

SmolLM2-360M-webgpu / README.md

LJTSG's picture

Initial upload: SmolLM2-360M WebGPU

8e605b3 verified about 1 month ago

|

History Blame Contribute Delete

1.28 kB

	---
	license: apache-2.0
	base_model: HuggingFaceTB/SmolLM2-360M-Instruct
	tags:
	- smollm2
	- webgpu
	- browser-inference
	- strix-halo
	- amd
	- unified-memory
	- tiny-model
	pipeline_tag: text-generation
	---

	# SmolLM2-360M on WebGPU

	HuggingFace's tiny 360M parameter model running in browser WebGPU.

	369 MB Q8_0 quantization. Loads in under 2 seconds. Generates instantly.

	Built and tested on AMD Strix Halo (Radeon 8060S iGPU, 64GB unified memory).

	## Quick Start

	1. Download Q8_0 GGUF from [bartowski](https://huggingface.co/bartowski/SmolLM2-360M-Instruct-GGUF)
	2. Place in `model_splits/` (no splitting needed — single file)
	3. `node serve.js` (port 8180)
	4. Open `http://localhost:8180` in Chrome

	## Use Cases

	- Lightweight chat and Q&A
	- Classification and summarization
	- Edge/IoT inference
	- Testing and prototyping

	## Hardware

	Any WebGPU-capable device. Tested on AMD Strix Halo but works on much smaller hardware too. The model is only 369 MB — it fits anywhere.

	## Why This Package

	Part of a series making popular models available on WebGPU for AMD unified memory AI PCs. WebGPU bypasses broken ROCm and routes through the gaming driver stack.

	## Credits

	Built by Joshua (LJTSG) and Claude.

	Co-Authored-By: Claude <noreply@anthropic.com>