SmolLM2-360M-webgpu / README.md
LJTSG's picture
Initial upload: SmolLM2-360M WebGPU
8e605b3 verified
|
Raw
History Blame Contribute Delete
1.28 kB
metadata
license: apache-2.0
base_model: HuggingFaceTB/SmolLM2-360M-Instruct
tags:
  - smollm2
  - webgpu
  - browser-inference
  - strix-halo
  - amd
  - unified-memory
  - tiny-model
pipeline_tag: text-generation

SmolLM2-360M on WebGPU

HuggingFace's tiny 360M parameter model running in browser WebGPU.

369 MB Q8_0 quantization. Loads in under 2 seconds. Generates instantly.

Built and tested on AMD Strix Halo (Radeon 8060S iGPU, 64GB unified memory).

Quick Start

  1. Download Q8_0 GGUF from bartowski
  2. Place in model_splits/ (no splitting needed — single file)
  3. node serve.js (port 8180)
  4. Open http://localhost:8180 in Chrome

Use Cases

  • Lightweight chat and Q&A
  • Classification and summarization
  • Edge/IoT inference
  • Testing and prototyping

Hardware

Any WebGPU-capable device. Tested on AMD Strix Halo but works on much smaller hardware too. The model is only 369 MB — it fits anywhere.

Why This Package

Part of a series making popular models available on WebGPU for AMD unified memory AI PCs. WebGPU bypasses broken ROCm and routes through the gaming driver stack.

Credits

Built by Joshua (LJTSG) and Claude.

Co-Authored-By: Claude noreply@anthropic.com