---
license: mit
base_model:
- Qwen/Qwen3-1.7B
pipeline_tag: text-generation
tags:
- onnx
- onnxruntime-genai
- oga
---

My Tests (Tesla P4)
- CUDA int4: 2179 MiB, 6 TPS
- CUDA fp16: 4221 MiB, 21 TPS
- CUDA fp32: dnf (memory)