SmolLM2-360M-webgpu / README.md
LJTSG's picture
Initial upload: SmolLM2-360M WebGPU
8e605b3 verified
|
Raw
History Blame Contribute Delete
1.28 kB
---
license: apache-2.0
base_model: HuggingFaceTB/SmolLM2-360M-Instruct
tags:
- smollm2
- webgpu
- browser-inference
- strix-halo
- amd
- unified-memory
- tiny-model
pipeline_tag: text-generation
---
# SmolLM2-360M on WebGPU
**HuggingFace's tiny 360M parameter model running in browser WebGPU.**
369 MB Q8_0 quantization. Loads in under 2 seconds. Generates instantly.
Built and tested on AMD Strix Halo (Radeon 8060S iGPU, 64GB unified memory).
## Quick Start
1. Download Q8_0 GGUF from [bartowski](https://huggingface.co/bartowski/SmolLM2-360M-Instruct-GGUF)
2. Place in `model_splits/` (no splitting needed — single file)
3. `node serve.js` (port 8180)
4. Open `http://localhost:8180` in Chrome
## Use Cases
- Lightweight chat and Q&A
- Classification and summarization
- Edge/IoT inference
- Testing and prototyping
## Hardware
Any WebGPU-capable device. Tested on AMD Strix Halo but works on much smaller hardware too. The model is only 369 MB — it fits anywhere.
## Why This Package
Part of a series making popular models available on WebGPU for AMD unified memory AI PCs. WebGPU bypasses broken ROCm and routes through the gaming driver stack.
## Credits
Built by Joshua (LJTSG) and Claude.
Co-Authored-By: Claude <noreply@anthropic.com>