flackzz's picture
Upload Transformers.js ONNX export
1cbdc41 verified
metadata
library_name: transformers.js
pipeline_tag: text-generation
license: apache-2.0
base_model: utter-project/EuroLLM-1.7B-Instruct
tags:
  - transformers.js
  - onnx
  - llama
  - conversational
  - text-generation

EuroLLM-1.7B-Instruct-ONNX

This repository contains ONNX weights for utter-project/EuroLLM-1.7B-Instruct prepared for use with Transformers.js.

Available dtypes in this export: fp32, q4, q8.

The repository layout follows the standard Transformers.js convention:

  • tokenizer and config files in the repository root
  • ONNX model files inside onnx/

Usage (Transformers.js)

import { pipeline } from "@huggingface/transformers";

const generator = await pipeline("text-generation", "EuroLLM-1.7B-Instruct-ONNX", {
  device: "webgpu",
  dtype: "fp16", // or "fp32", "q4", "q8"
});

const messages = [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Schreibe einen kurzen Satz auf Deutsch." },
];

const output = await generator(messages, { max_new_tokens: 64 });
console.log(output[0].generated_text.at(-1).content);

Recommended choices:

  • fp32: highest precision, typically for WebGPU
  • fp16: smaller WebGPU model with good speed/quality tradeoff
  • q8: smaller CPU/WASM model
  • q4: smallest model, best for constrained devices

Source Model