Qwen2.5-1.5B-Instruct — ONNX (FP16)

ONNX export of Qwen2.5-1.5B-Instruct (1.5B parameters, FP16 weights) with KV cache support for efficient autoregressive generation.

Converted for use with inference4j, an inference-only AI library for Java.

Original Source

Repository: Qwen/Qwen2.5-1.5B-Instruct
License: Apache 2.0

Usage with inference4j

try (var gen = OnnxTextGenerator.qwen2().build()) {
    GenerationResult result = gen.generate("What is Java?");
    System.out.println(result.text());
}

Model Details

Property	Value
Architecture	Qwen2ForCausalLM (1.5B parameters, 28 layers, 1536 hidden, 12 heads, 2 KV heads)
Task	Text generation (instruction-tuned)
Precision	FP16
Context length	32,768 tokens
Vocabulary	151,936 tokens (BPE)
Chat template	ChatML (`<
Original framework	PyTorch (transformers)
Export method	Hugging Face Optimum (with KV cache, FP16)

License

This model is licensed under the Apache License 2.0. Original model by Qwen Team, Alibaba Cloud.

Downloads last month: 8