| license: apache-2.0 | |
| base_model: HuggingFaceTB/SmolLM2-360M-Instruct | |
| tags: | |
| - smollm2 | |
| - webgpu | |
| - browser-inference | |
| - strix-halo | |
| - amd | |
| - unified-memory | |
| - tiny-model | |
| pipeline_tag: text-generation | |
| # SmolLM2-360M on WebGPU | |
| **HuggingFace's tiny 360M parameter model running in browser WebGPU.** | |
| 369 MB Q8_0 quantization. Loads in under 2 seconds. Generates instantly. | |
| Built and tested on AMD Strix Halo (Radeon 8060S iGPU, 64GB unified memory). | |
| ## Quick Start | |
| 1. Download Q8_0 GGUF from [bartowski](https://huggingface.co/bartowski/SmolLM2-360M-Instruct-GGUF) | |
| 2. Place in `model_splits/` (no splitting needed — single file) | |
| 3. `node serve.js` (port 8180) | |
| 4. Open `http://localhost:8180` in Chrome | |
| ## Use Cases | |
| - Lightweight chat and Q&A | |
| - Classification and summarization | |
| - Edge/IoT inference | |
| - Testing and prototyping | |
| ## Hardware | |
| Any WebGPU-capable device. Tested on AMD Strix Halo but works on much smaller hardware too. The model is only 369 MB — it fits anywhere. | |
| ## Why This Package | |
| Part of a series making popular models available on WebGPU for AMD unified memory AI PCs. WebGPU bypasses broken ROCm and routes through the gaming driver stack. | |
| ## Credits | |
| Built by Joshua (LJTSG) and Claude. | |
| Co-Authored-By: Claude <noreply@anthropic.com> | |