File size: 1,542 Bytes
706a56c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | ---
library_name: onnx
tags:
- text-generation
- tinyllama
- llama
- onnx
- inference4j
license: apache-2.0
pipeline_tag: text-generation
---
# TinyLlama-1.1B-Chat — ONNX (FP16)
ONNX export of [TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) (1.1B parameters, FP16 weights) with KV cache support for efficient autoregressive generation.
Converted for use with [inference4j](https://github.com/inference4j/inference4j), an inference-only AI library for Java.
## Original Source
- **Repository:** [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)
- **License:** Apache 2.0
## Usage with inference4j
```java
try (var gen = OnnxTextGenerator.tinyLlama().build()) {
GenerationResult result = gen.generate("What is Java?");
System.out.println(result.text());
}
```
## Model Details
| Property | Value |
|----------|-------|
| Architecture | LlamaForCausalLM (1.1B parameters, 22 layers, 2048 hidden, 32 heads, 4 KV heads) |
| Task | Text generation (instruction-tuned, Zephyr chat template) |
| Precision | FP16 |
| Context length | 2048 tokens |
| Vocabulary | 32,000 tokens (SentencePiece BPE) |
| Chat template | Zephyr (`<|user|>`...`</s>`) |
| Original framework | PyTorch (transformers) |
| Export method | Hugging Face Optimum (with KV cache, FP16) |
## License
This model is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0). Original model by [TinyLlama](https://huggingface.co/TinyLlama).
|