SmolLM2-1.7B-Instruct โ€” ONNX (FP16)

ONNX export of SmolLM2-1.7B-Instruct (1.7B parameters, FP16 weights) with KV cache support for efficient autoregressive generation.

Converted for use with inference4j, an inference-only AI library for Java.

Original Source

Usage with inference4j

try (var gen = OnnxTextGenerator.smolLM2_1_7B().build()) {
    GenerationResult result = gen.generate("What is Java?");
    System.out.println(result.text());
}

Model Details

Property Value
Architecture LlamaForCausalLM (1.7B parameters, 24 layers, 2048 hidden, 32 heads, 32 KV heads)
Task Text generation (instruction-tuned)
Precision FP16
Context length 8192 tokens
Vocabulary 49,152 tokens (BPE)
Chat template ChatML (`<
Original framework PyTorch (transformers)
Export method Hugging Face Optimum (with KV cache, FP16)

License

This model is licensed under the Apache License 2.0. Original model by HuggingFace.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support