| --- |
| library_name: onnx |
| tags: |
| - text-generation |
| - tinyllama |
| - llama |
| - onnx |
| - inference4j |
| license: apache-2.0 |
| pipeline_tag: text-generation |
| --- |
| |
| # TinyLlama-1.1B-Chat — ONNX (FP16) |
|
|
| ONNX export of [TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) (1.1B parameters, FP16 weights) with KV cache support for efficient autoregressive generation. |
|
|
| Converted for use with [inference4j](https://github.com/inference4j/inference4j), an inference-only AI library for Java. |
|
|
| ## Original Source |
|
|
| - **Repository:** [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) |
| - **License:** Apache 2.0 |
|
|
| ## Usage with inference4j |
|
|
| ```java |
| try (var gen = OnnxTextGenerator.tinyLlama().build()) { |
| GenerationResult result = gen.generate("What is Java?"); |
| System.out.println(result.text()); |
| } |
| ``` |
|
|
| ## Model Details |
|
|
| | Property | Value | |
| |----------|-------| |
| | Architecture | LlamaForCausalLM (1.1B parameters, 22 layers, 2048 hidden, 32 heads, 4 KV heads) | |
| | Task | Text generation (instruction-tuned, Zephyr chat template) | |
| | Precision | FP16 | |
| | Context length | 2048 tokens | |
| | Vocabulary | 32,000 tokens (SentencePiece BPE) | |
| | Chat template | Zephyr (`<|user|>`...`</s>`) | |
| | Original framework | PyTorch (transformers) | |
| | Export method | Hugging Face Optimum (with KV cache, FP16) | |
|
|
| ## License |
|
|
| This model is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0). Original model by [TinyLlama](https://huggingface.co/TinyLlama). |
|
|