---
license: apache-2.0
base_model: HuggingFaceTB/SmolLM2-360M-Instruct
tags:
  - smollm2
  - webgpu
  - browser-inference
  - strix-halo
  - amd
  - unified-memory
  - tiny-model
pipeline_tag: text-generation
---

# SmolLM2-360M on WebGPU

**HuggingFace's tiny 360M parameter model running in browser WebGPU.**

369 MB Q8_0 quantization. Loads in under 2 seconds. Generates instantly.

Built and tested on AMD Strix Halo (Radeon 8060S iGPU, 64GB unified memory).

## Quick Start

1. Download Q8_0 GGUF from [bartowski](https://huggingface.co/bartowski/SmolLM2-360M-Instruct-GGUF)
2. Place in `model_splits/` (no splitting needed — single file)
3. `node serve.js` (port 8180)
4. Open `http://localhost:8180` in Chrome

## Use Cases

- Lightweight chat and Q&A
- Classification and summarization
- Edge/IoT inference
- Testing and prototyping

## Hardware

Any WebGPU-capable device. Tested on AMD Strix Halo but works on much smaller hardware too. The model is only 369 MB — it fits anywhere.

## Why This Package

Part of a series making popular models available on WebGPU for AMD unified memory AI PCs. WebGPU bypasses broken ROCm and routes through the gaming driver stack.

## Credits

Built by Joshua (LJTSG) and Claude.

Co-Authored-By: Claude <noreply@anthropic.com>