SmolLM2-1.7B-Instruct โ ONNX (FP16)
ONNX export of SmolLM2-1.7B-Instruct (1.7B parameters, FP16 weights) with KV cache support for efficient autoregressive generation.
Converted for use with inference4j, an inference-only AI library for Java.
Original Source
- Repository: HuggingFaceTB/SmolLM2-1.7B-Instruct
- License: Apache 2.0
Usage with inference4j
try (var gen = OnnxTextGenerator.smolLM2_1_7B().build()) {
GenerationResult result = gen.generate("What is Java?");
System.out.println(result.text());
}
Model Details
| Property | Value |
|---|---|
| Architecture | LlamaForCausalLM (1.7B parameters, 24 layers, 2048 hidden, 32 heads, 32 KV heads) |
| Task | Text generation (instruction-tuned) |
| Precision | FP16 |
| Context length | 8192 tokens |
| Vocabulary | 49,152 tokens (BPE) |
| Chat template | ChatML (`< |
| Original framework | PyTorch (transformers) |
| Export method | Hugging Face Optimum (with KV cache, FP16) |
License
This model is licensed under the Apache License 2.0. Original model by HuggingFace.
- Downloads last month
- -