--- license: apache-2.0 base_model: HuggingFaceTB/SmolLM2-360M-Instruct tags: - smollm2 - webgpu - browser-inference - strix-halo - amd - unified-memory - tiny-model pipeline_tag: text-generation --- # SmolLM2-360M on WebGPU **HuggingFace's tiny 360M parameter model running in browser WebGPU.** 369 MB Q8_0 quantization. Loads in under 2 seconds. Generates instantly. Built and tested on AMD Strix Halo (Radeon 8060S iGPU, 64GB unified memory). ## Quick Start 1. Download Q8_0 GGUF from [bartowski](https://huggingface.co/bartowski/SmolLM2-360M-Instruct-GGUF) 2. Place in `model_splits/` (no splitting needed — single file) 3. `node serve.js` (port 8180) 4. Open `http://localhost:8180` in Chrome ## Use Cases - Lightweight chat and Q&A - Classification and summarization - Edge/IoT inference - Testing and prototyping ## Hardware Any WebGPU-capable device. Tested on AMD Strix Halo but works on much smaller hardware too. The model is only 369 MB — it fits anywhere. ## Why This Package Part of a series making popular models available on WebGPU for AMD unified memory AI PCs. WebGPU bypasses broken ROCm and routes through the gaming driver stack. ## Credits Built by Joshua (LJTSG) and Claude. Co-Authored-By: Claude