Qwen2.5-1.5B-Instruct โ ONNX (FP16)
ONNX export of Qwen2.5-1.5B-Instruct (1.5B parameters, FP16 weights) with KV cache support for efficient autoregressive generation.
Converted for use with inference4j, an inference-only AI library for Java.
Original Source
- Repository: Qwen/Qwen2.5-1.5B-Instruct
- License: Apache 2.0
Usage with inference4j
try (var gen = OnnxTextGenerator.qwen2().build()) {
GenerationResult result = gen.generate("What is Java?");
System.out.println(result.text());
}
Model Details
| Property | Value |
|---|---|
| Architecture | Qwen2ForCausalLM (1.5B parameters, 28 layers, 1536 hidden, 12 heads, 2 KV heads) |
| Task | Text generation (instruction-tuned) |
| Precision | FP16 |
| Context length | 32,768 tokens |
| Vocabulary | 151,936 tokens (BPE) |
| Chat template | ChatML (`< |
| Original framework | PyTorch (transformers) |
| Export method | Hugging Face Optimum (with KV cache, FP16) |
License
This model is licensed under the Apache License 2.0. Original model by Qwen Team, Alibaba Cloud.
- Downloads last month
- 5