SmolLM2-360M-Instruct โ ONNX
ONNX export of SmolLM2-360M-Instruct (360M parameters) with KV cache support for efficient autoregressive generation.
Converted for use with inference4j, an inference-only AI library for Java.
Original Source
- Repository: HuggingFaceTB/SmolLM2-360M-Instruct
- License: Apache 2.0
Usage with inference4j
try (var gen = SmolLM2TextGenerator.builder().build()) {
GenerationResult result = gen.generate("What is Java?");
System.out.println(result.text());
}
Model Details
| Property | Value |
|---|---|
| Architecture | LlamaForCausalLM (360M parameters, 32 layers, 960 hidden, 15 heads, 5 KV heads) |
| Task | Text generation (instruction-tuned) |
| Context length | 8192 tokens |
| Vocabulary | 49,152 tokens (BPE) |
| Chat template | ChatML (`< |
| Original framework | PyTorch (transformers) |
| Export method | Hugging Face Optimum (with KV cache) |
License
This model is licensed under the Apache License 2.0. Original model by HuggingFace.
- Downloads last month
- 2