TinySwallow-1.5B-Instruct (ONNX, q4 + int8 embed)
ONNX-quantized version of SakanaAI/TinySwallow-1.5B-Instruct for in-browser inference with transformers.js v3 + WebGPU.
File
| File | Size | Notes |
|---|---|---|
onnx/model_q4.onnx |
~1.06 GB | 4-bit MatMul (block_size=128, sym) + int8 per-row embeddings; everything inline. |
Usage
import { pipeline } from "@huggingface/transformers";
const generator = await pipeline(
"text-generation",
"moeadham/TinySwallow-1.5B-Instruct-ONNX",
{ device: "webgpu", dtype: "q4" }
);
const out = await generator(
[{ role: "user", content: "知識蒸留について簡単に教えてください。" }],
{ max_new_tokens: 256 }
);
console.log(out[0].generated_text);
- Downloads last month
- 67
Model tree for moeadham/TinySwallow-1.5B-Instruct-ONNX
Base model
Qwen/Qwen2.5-1.5B Finetuned
Qwen/Qwen2.5-1.5B-Instruct Finetuned
SakanaAI/TinySwallow-1.5B-Instruct