SmolLM2-360M-Instruct — ONNX

ONNX export of SmolLM2-360M-Instruct (360M parameters) with KV cache support for efficient autoregressive generation.

Converted for use with inference4j, an inference-only AI library for Java.

Original Source

Repository: HuggingFaceTB/SmolLM2-360M-Instruct
License: Apache 2.0

Usage with inference4j

try (var gen = SmolLM2TextGenerator.builder().build()) {
    GenerationResult result = gen.generate("What is Java?");
    System.out.println(result.text());
}

Model Details

Property	Value
Architecture	LlamaForCausalLM (360M parameters, 32 layers, 960 hidden, 15 heads, 5 KV heads)
Task	Text generation (instruction-tuned)
Context length	8192 tokens
Vocabulary	49,152 tokens (BPE)
Chat template	ChatML (`<
Original framework	PyTorch (transformers)
Export method	Hugging Face Optimum (with KV cache)

License

This model is licensed under the Apache License 2.0. Original model by HuggingFace.

Downloads last month: 3