TinySwallow-1.5B-Instruct (ONNX, q4 + int8 embed)

ONNX-quantized version of SakanaAI/TinySwallow-1.5B-Instruct for in-browser inference with transformers.js v3 + WebGPU.

File

File Size Notes
onnx/model_q4.onnx ~1.06 GB 4-bit MatMul (block_size=128, sym) + int8 per-row embeddings; everything inline.

Usage

import { pipeline } from "@huggingface/transformers";

const generator = await pipeline(
  "text-generation",
  "moeadham/TinySwallow-1.5B-Instruct-ONNX",
  { device: "webgpu", dtype: "q4" }
);
const out = await generator(
  [{ role: "user", content: "知識蒸留について簡単に教えてください。" }],
  { max_new_tokens: 256 }
);
console.log(out[0].generated_text);
Downloads last month
67
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for moeadham/TinySwallow-1.5B-Instruct-ONNX

Quantized
(16)
this model