TCompress Model Benchmark Report

... (the rest of your markdown follows here)

TCompress Model Benchmark Report

Quantization Type: QAT (Quantization Aware Training) Precision: BF16 Evaluation Dataset: Salesforce/wikitext (wikitext-2-raw-v1)


Performance Metrics

Metric Result
Total Tokens Evaluated 1,000,000
Latency (Mean) 58.12 ms
Throughput 17,206.3 tok/s
Peak GPU Memory 1,174.4 MB

Model Accuracy

Variant Agreement vs Base Flipped Tokens
TCompress (bf16) 94.92% 50,829

Storage & Packaging

Asset Size
Model Weight File 298.0 MB

Implementation Details

This model has been optimized via Training-Aware Quantization to maintain high fidelity (94.92% agreement) with the base FP32 architecture while significantly reducing memory footprint and maximizing throughput on CUDA-enabled hardware.

Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
0.1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support