TinySwallow-1.5B-Instruct (ONNX, q4 + int8 embed)

ONNX-quantized version of SakanaAI/TinySwallow-1.5B-Instruct for in-browser inference with transformers.js v3 + WebGPU.

File

File	Size	Notes
`onnx/model_q4.onnx`	~1.06 GB	4-bit MatMul (block_size=128, sym) + int8 per-row embeddings; everything inline.

Usage

import { pipeline } from "@huggingface/transformers";

const generator = await pipeline(
  "text-generation",
  "moeadham/TinySwallow-1.5B-Instruct-ONNX",
  { device: "webgpu", dtype: "q4" }
);
const out = await generator(
  [{ role: "user", content: "知識蒸留について簡単に教えてください。" }],
  { max_new_tokens: 256 }
);
console.log(out[0].generated_text);

Downloads last month: 67

Model tree for moeadham/TinySwallow-1.5B-Instruct-ONNX

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Finetuned

SakanaAI/TinySwallow-1.5B-Instruct

Quantized

(16)

this model