Qwen2.5-1.5B-Instruct โ€” ONNX (FP16)

ONNX export of Qwen2.5-1.5B-Instruct (1.5B parameters, FP16 weights) with KV cache support for efficient autoregressive generation.

Converted for use with inference4j, an inference-only AI library for Java.

Original Source

Usage with inference4j

try (var gen = OnnxTextGenerator.qwen2().build()) {
    GenerationResult result = gen.generate("What is Java?");
    System.out.println(result.text());
}

Model Details

Property Value
Architecture Qwen2ForCausalLM (1.5B parameters, 28 layers, 1536 hidden, 12 heads, 2 KV heads)
Task Text generation (instruction-tuned)
Precision FP16
Context length 32,768 tokens
Vocabulary 151,936 tokens (BPE)
Chat template ChatML (`<
Original framework PyTorch (transformers)
Export method Hugging Face Optimum (with KV cache, FP16)

License

This model is licensed under the Apache License 2.0. Original model by Qwen Team, Alibaba Cloud.

Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support