inference4j
/

tinyllama-1.1b-chat

Text Generation

Model card Files Files and versions

tinyllama-1.1b-chat / README.md

vccarvalho11's picture

Upload tinyllama-1.1b-chat ONNX model

706a56c verified about 1 month ago

|

history blame contribute delete

1.54 kB

	---
	library_name: onnx
	tags:
	- text-generation
	- tinyllama
	- llama
	- onnx
	- inference4j
	license: apache-2.0
	pipeline_tag: text-generation
	---

	# TinyLlama-1.1B-Chat — ONNX (FP16)

	ONNX export of [TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) (1.1B parameters, FP16 weights) with KV cache support for efficient autoregressive generation.

	Converted for use with [inference4j](https://github.com/inference4j/inference4j), an inference-only AI library for Java.

	## Original Source

	- Repository: [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)
	- License: Apache 2.0

	## Usage with inference4j

	```java
	try (var gen = OnnxTextGenerator.tinyLlama().build()) {
	GenerationResult result = gen.generate("What is Java?");
	System.out.println(result.text());
	}
	```

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Architecture \| LlamaForCausalLM (1.1B parameters, 22 layers, 2048 hidden, 32 heads, 4 KV heads) \|
	\| Task \| Text generation (instruction-tuned, Zephyr chat template) \|
	\| Precision \| FP16 \|
	\| Context length \| 2048 tokens \|
	\| Vocabulary \| 32,000 tokens (SentencePiece BPE) \|
	\| Chat template \| Zephyr (`<\|user\|>`...`</s>`) \|
	\| Original framework \| PyTorch (transformers) \|
	\| Export method \| Hugging Face Optimum (with KV cache, FP16) \|

	## License

	This model is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0). Original model by [TinyLlama](https://huggingface.co/TinyLlama).