Qwen3-4B-AWQ TensorRT-EdgeLLM Engine

This repository contains the TensorRT-EdgeLLM engine artifacts for Qwen/Qwen3-4B-AWQ on Jetson Orin.

Runtime compatibility:

  • TensorRT: 10.3.0.30-1+cuda12.5
  • CUDA runtime: JetPack host CUDA 12.x
  • Target GPU: Orin, compute capability 8.7
  • TensorRT-Edge-LLM commit: 7d6579c8293c5cc62986238aa200ea9c7b57d50a
  • Engine config: max batch size 1, max input length 3072, KV cache capacity 4096

The engine must be used with a matching TensorRT-EdgeLLM runtime/plugin build. It is consumed by the edge-llm-chat-service container via EDGELLM_ENGINE_REPO.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support