Initial upload: SmolLM2-360M WebGPU

8e605b3 verified about 1 month ago

1.28 kB

license: apache-2.0
base_model: HuggingFaceTB/SmolLM2-360M-Instruct
tags:
  - smollm2
  - webgpu
  - browser-inference
  - strix-halo
  - amd
  - unified-memory
  - tiny-model
pipeline_tag: text-generation

SmolLM2-360M on WebGPU

HuggingFace's tiny 360M parameter model running in browser WebGPU.

369 MB Q8_0 quantization. Loads in under 2 seconds. Generates instantly.

Built and tested on AMD Strix Halo (Radeon 8060S iGPU, 64GB unified memory).

Quick Start

Download Q8_0 GGUF from bartowski
Place in model_splits/ (no splitting needed — single file)
node serve.js (port 8180)
Open http://localhost:8180 in Chrome

Use Cases

Lightweight chat and Q&A
Classification and summarization
Edge/IoT inference
Testing and prototyping

Hardware

Any WebGPU-capable device. Tested on AMD Strix Halo but works on much smaller hardware too. The model is only 369 MB — it fits anywhere.

Why This Package

Part of a series making popular models available on WebGPU for AMD unified memory AI PCs. WebGPU bypasses broken ROCm and routes through the gaming driver stack.

Credits

Built by Joshua (LJTSG) and Claude.

Co-Authored-By: Claude noreply@anthropic.com