--- library_name: onnx tags: - text-generation - tinyllama - llama - onnx - inference4j license: apache-2.0 pipeline_tag: text-generation --- # TinyLlama-1.1B-Chat — ONNX (FP16) ONNX export of [TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) (1.1B parameters, FP16 weights) with KV cache support for efficient autoregressive generation. Converted for use with [inference4j](https://github.com/inference4j/inference4j), an inference-only AI library for Java. ## Original Source - **Repository:** [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) - **License:** Apache 2.0 ## Usage with inference4j ```java try (var gen = OnnxTextGenerator.tinyLlama().build()) { GenerationResult result = gen.generate("What is Java?"); System.out.println(result.text()); } ``` ## Model Details | Property | Value | |----------|-------| | Architecture | LlamaForCausalLM (1.1B parameters, 22 layers, 2048 hidden, 32 heads, 4 KV heads) | | Task | Text generation (instruction-tuned, Zephyr chat template) | | Precision | FP16 | | Context length | 2048 tokens | | Vocabulary | 32,000 tokens (SentencePiece BPE) | | Chat template | Zephyr (`<|user|>`...``) | | Original framework | PyTorch (transformers) | | Export method | Hugging Face Optimum (with KV cache, FP16) | ## License This model is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0). Original model by [TinyLlama](https://huggingface.co/TinyLlama).