| license: mit | |
| base_model: | |
| - Qwen/Qwen3-1.7B | |
| pipeline_tag: text-generation | |
| tags: | |
| - onnx | |
| - onnxruntime-genai | |
| - oga | |
| My Tests (Tesla P4) | |
| - CUDA int4: 2179 MiB, 6 TPS | |
| - CUDA fp16: 4221 MiB, 21 TPS | |
| - CUDA fp32: dnf (memory) | |
| license: mit | |
| base_model: | |
| - Qwen/Qwen3-1.7B | |
| pipeline_tag: text-generation | |
| tags: | |
| - onnx | |
| - onnxruntime-genai | |
| - oga | |
| My Tests (Tesla P4) | |
| - CUDA int4: 2179 MiB, 6 TPS | |
| - CUDA fp16: 4221 MiB, 21 TPS | |
| - CUDA fp32: dnf (memory) | |