Instructions to use Reza2kn/sapiens2-pose-5b-INT4-G128 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sapiens
How to use Reza2kn/sapiens2-pose-5b-INT4-G128 with sapiens:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- sapiens2
How to use Reza2kn/sapiens2-pose-5b-INT4-G128 with sapiens2:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
sapiens2-pose-5b INT4-G128
Packed 4-bit derivative of facebook/sapiens2-pose-5b.
This artifact uses symmetric per-group INT4 packing with group size 128 for large floating-point weight tensors. Norms, biases, positional/rope tensors, and small tensors are kept in their source dtype. It is a storage/runtime-loader quant for the current official Sapiens2 code path, not an AWQ/GGUF/NVFP4 LLM artifact.
Files
facebook__sapiens2-pose-5b-int4-g128.safetensors: packed INT4 safetensors artifact.load_sapiens2_int4.py: loader that reconstructs a PyTorch state dict for the official Sapiens2 model code.config.jsonandpreprocessor_config.json: copied from the source repo.quantization_report.json: build report.
Quantization Report
- Source revision:
dfc4f28877b041f50e9c55663bf29ed38f9c18a1 - Group size:
128 - Source bytes:
20480794652 - Artifact bytes:
2647416860 - Compression ratio:
7.7361x - Tensors:
968 - Quantized tensors:
344 - Max tensor MAE during dequant smoke:
0.00373528
Fidelity Validation
The packed INT4 artifact was dequantized back to floating-point tensors and compared against the source checkpoint.
- Validation gate: global floating-tensor similarity >=
90.00% - Result:
PASS - Global floating-tensor similarity:
99.327831% - Minimum large-tensor cosine:
0.986056328
Loading
from load_sapiens2_int4 import load_state_dict
state_dict = load_state_dict("facebook__sapiens2-pose-5b-int4-g128.safetensors", device="cpu")
# Then instantiate the matching official Sapiens2 architecture and load:
# model.load_state_dict(state_dict, strict=True)
Limitations
This is a verified packed-weight artifact with a dequantizing loader. It does not claim native INT4 CUDA kernels for Sapiens2 yet. Runtime speedups require a Sapiens2-specific kernel/export path and should be benchmarked separately.
- Downloads last month
- -
Model tree for Reza2kn/sapiens2-pose-5b-INT4-G128
Base model
facebook/sapiens2-pretrain-5b