TCompress Model Benchmark Report

... (the rest of your markdown follows here)

TCompress Model Benchmark Report

Quantization Type: QAT (Quantization Aware Training) Precision: BF16 Evaluation Dataset: Salesforce/wikitext (wikitext-2-raw-v1)

Performance Metrics

Metric	Result
Total Tokens Evaluated	1,000,000
Latency (Mean)	58.12 ms
Throughput	17,206.3 tok/s
Peak GPU Memory	1,174.4 MB

Model Accuracy

Variant	Agreement vs Base	Flipped Tokens
TCompress (bf16)	94.92%	50,829

Storage & Packaging

Asset	Size
Model Weight File	298.0 MB

Implementation Details

This model has been optimized via Training-Aware Quantization to maintain high fidelity (94.92% agreement) with the base FP32 architecture while significantly reducing memory footprint and maximizing throughput on CUDA-enabled hardware.

Downloads last month: -; Downloads are not tracked for this model. How to track

Safetensors

Model size

0.1B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support