LukasRRecogni

update README with accuracy numbers and correct links

cf45c57 2 months ago

2.77 kB

metadata

base_model:
  - openai/clip-vit-large-patch14
base_model_relation: quantized
pipeline_tag: zero-shot-image-classification
tags:
  - quantized
  - hardware-optimized
  - clip
  - vision
  - tensordyne
license: apache-2.0

📝 Overview

Tensordyne builds advanced AI-inference systems, enabling faster, more affordable, and sustainable generative AI.

This repository provides resources to quickly get started with CLIP-vit-large on the Tensordyne Inference System and its SDK.

🧩 Model Details

Quantization: post-training quantization of the base model, no fine-tuning or additional training was performed
Supported data types: Tensordyne FP16 (tFP16), Tensordyne FP8 (tFP8), mixed-precision

⚙️ Quantization

The Tensordyne SDK offers multiple post-training quantization strategies to convert AI models for efficient inference on the Tensordyne Inference System — fully customizable for your optimization targets.
We showcase several preselected quantization variants that can be applied on-the-fly to quantize to Tensordyne data types here. The calibration-based strategies are defined by quantization configurations provided as .json.

The quantized models are evaluated on a subset of the imagenet-1k test set. Negative relative accuracy drops indicate that the model performs better than the float base model.

Model Configuration	Top-1 Accuracy [%]	Relative Top-1 Accuracy Drop vs. IEEE FP32	Details
IEEE FP32	71.36	–	The baseline model trained in IEEE FP32
calibration_based_tFP16	71.34	0.02 %	calibration-based tFP16 quantization
layerwise_mixed_precision	71.20	0.22 %	calibration-based mixed-precision: tFP8, outliers in tFP16

🚀 Getting Started

Refer to the Tensordyne Hugging Face Hub tutorial for instructions on using the artifacts provided in this repository.
Our hosted documentation provides more information on Tensordyne's quantization strategies and introduces you to our SDK.