Model Speed Comparator

Input Text

The movie was terrible Best experience ever! It was okay, nothing special Completely disappointed

—

Speedup achieved — Quantized INT8 vs PyTorch baseline
Smallest model: — · Fastest: —

Session Stats

Latency Comparison (ms) — lower is better

Live Latency Trend (last 10 runs)

Model Size (MB) — lower is better

20-Run Benchmark Latency (ms)

Model	Average	Min	Max	P95

What This Demonstrates

ONNX Export — Converts PyTorch model to a hardware-agnostic format. ONNX Runtime applies graph optimizations (operator fusion, memory planning) that PyTorch doesn't do by default.

INT8 Quantization — Reduces weight precision from 32-bit floats to 8-bit integers. 4x smaller model, faster memory bandwidth, same accuracy on most NLP tasks.

Why it matters — AI accelerator teams (like HCL's) optimize model inference for deployment at scale. These techniques are the foundation of production ML systems.