TinyLLama TensorRT LLM Edition.

This repo contains the TensorRT LLM version of TinyLlama Model. The conversion is done to support Float16 precision on Nvidia TensorRT.

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support