Qwen2.5-3B-Instruct ONNX

ONNX conversion of Qwen/Qwen2.5-3B-Instruct for use with Transformers.js.

Quantization

  • q4f16: 4-bit weights with fp16 compute (~2.2 GB)

Usage

import { AutoTokenizer, AutoModelForCausalLM } from '@huggingface/transformers';

const tokenizer = await AutoTokenizer.from_pretrained('opalitestudios/Qwen2.5-3B-Instruct-ONNX');
const model = await AutoModelForCausalLM.from_pretrained('opalitestudios/Qwen2.5-3B-Instruct-ONNX', {
  dtype: 'q4f16',
});
Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for opalitestudios/Qwen2.5-3B-Instruct-ONNX

Base model

Qwen/Qwen2.5-3B
Quantized
(229)
this model