Instructions to use harvestsu/Qwen3-4B-AWQ-TensorRT-EdgeLLM-engine with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- TensorRT
How to use harvestsu/Qwen3-4B-AWQ-TensorRT-EdgeLLM-engine with TensorRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Qwen3-4B-AWQ TensorRT-EdgeLLM Engine
This repository contains the TensorRT-EdgeLLM engine artifacts for
Qwen/Qwen3-4B-AWQ on Jetson Orin.
Runtime compatibility:
- TensorRT:
10.3.0.30-1+cuda12.5 - CUDA runtime: JetPack host CUDA 12.x
- Target GPU: Orin, compute capability 8.7
- TensorRT-Edge-LLM commit:
7d6579c8293c5cc62986238aa200ea9c7b57d50a - Engine config: max batch size 1, max input length 3072, KV cache capacity 4096
The engine must be used with a matching TensorRT-EdgeLLM runtime/plugin build.
It is consumed by the edge-llm-chat-service container via
EDGELLM_ENGINE_REPO.
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support