sapiens2-pose-0.8b INT4-G128

Packed 4-bit derivative of facebook/sapiens2-pose-0.8b.

This artifact uses symmetric per-group INT4 packing with group size 128 for large floating-point weight tensors. Norms, biases, positional/rope tensors, and small tensors are kept in their source dtype. It is a storage/runtime-loader quant for the current official Sapiens2 code path, not an AWQ/GGUF/NVFP4 LLM artifact.

Files

  • facebook__sapiens2-pose-0.8b-int4-g128.safetensors: packed INT4 safetensors artifact.
  • load_sapiens2_int4.py: loader that reconstructs a PyTorch state dict for the official Sapiens2 model code.
  • config.json and preprocessor_config.json: copied from the source repo.
  • quantization_report.json: build report.

Quantization Report

  • Source revision: 2d5be15b2cdc2bac5d9ea8a61849c3ba2c52f42a
  • Group size: 128
  • Source bytes: 3394819360
  • Artifact bytes: 439862416
  • Compression ratio: 7.7179x
  • Tensors: 560
  • Quantized tensors: 200
  • Max tensor MAE during dequant smoke: 0.00429335

Fidelity Validation

The packed INT4 artifact was dequantized back to floating-point tensors and compared against the source checkpoint.

  • Validation gate: global floating-tensor similarity >= 90.00%
  • Result: PASS
  • Global floating-tensor similarity: 99.425898%
  • Minimum large-tensor cosine: 0.974259198

Loading

from load_sapiens2_int4 import load_state_dict

state_dict = load_state_dict("facebook__sapiens2-pose-0.8b-int4-g128.safetensors", device="cpu")
# Then instantiate the matching official Sapiens2 architecture and load:
# model.load_state_dict(state_dict, strict=True)

Limitations

This is a verified packed-weight artifact with a dequantizing loader. It does not claim native INT4 CUDA kernels for Sapiens2 yet. Runtime speedups require a Sapiens2-specific kernel/export path and should be benchmarked separately.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Reza2kn/sapiens2-pose-0.8b-INT4-G128

Quantized
(1)
this model

Collection including Reza2kn/sapiens2-pose-0.8b-INT4-G128