license: apache-2.0

Base on https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0

Convert to onnx model using https://github.com/microsoft/onnxruntime-genai (commit: 74ec1becd9ca84099c2e8cfaa4798a8a9fc66beb)

Using command:

onnxruntime-genai\src\python\py\models> python builder.py -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 \
-o model-output-path -c model-cache-dir -e webgpu -p int4 --extra_options int4_block_size=32 \
int4_accuracy_level=4 int4_op_types_to_quantize=MatMul/Gather
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support