EuroLLM-1.7B-Instruct-ONNX

This repository contains ONNX weights for utter-project/EuroLLM-1.7B-Instruct prepared for use with Transformers.js.

Available dtypes in this export: fp32, q4, q8.

The repository layout follows the standard Transformers.js convention:

  • tokenizer and config files in the repository root
  • ONNX model files inside onnx/

Usage (Transformers.js)

import { pipeline } from "@huggingface/transformers";

const generator = await pipeline("text-generation", "EuroLLM-1.7B-Instruct-ONNX", {
  device: "webgpu",
  dtype: "fp16", // or "fp32", "q4", "q8"
});

const messages = [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Schreibe einen kurzen Satz auf Deutsch." },
];

const output = await generator(messages, { max_new_tokens: 64 });
console.log(output[0].generated_text.at(-1).content);

Recommended choices:

  • fp32: highest precision, typically for WebGPU
  • fp16: smaller WebGPU model with good speed/quality tradeoff
  • q8: smaller CPU/WASM model
  • q4: smallest model, best for constrained devices

Source Model

Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for flackzz/EuroLLM-1.7B-Instruct-ONNX

Quantized
(12)
this model