--- license: mit base_model: - Qwen/Qwen3-1.7B pipeline_tag: text-generation tags: - onnx - onnxruntime-genai - oga --- My Tests (Tesla P4) - CUDA int4: 2179 MiB, 6 TPS - CUDA fp16: 4221 MiB, 21 TPS - CUDA fp32: dnf (memory)