Qwen3-4B-AWQ TensorRT-EdgeLLM Engine

This repository contains the TensorRT-EdgeLLM engine artifacts for Qwen/Qwen3-4B-AWQ on Jetson Orin.

Runtime compatibility:

TensorRT: 10.3.0.30-1+cuda12.5
CUDA runtime: JetPack host CUDA 12.x
Target GPU: Orin, compute capability 8.7
TensorRT-Edge-LLM commit: 7d6579c8293c5cc62986238aa200ea9c7b57d50a
Engine config: max batch size 1, max input length 3072, KV cache capacity 4096

The engine must be used with a matching TensorRT-EdgeLLM runtime/plugin build. It is consumed by the edge-llm-chat-service container via EDGELLM_ENGINE_REPO.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support