surrealgrain's picture
Add Tensorizer header length parser PoC
e355680 verified
# Tensorizer Header Length Parser PoC
This repository contains a local-only, non-destructive proof of concept for a
Tensorizer `.tensors` parser issue.
The PoC demonstrates that changing the first per-tensor header length in a
valid Tensorizer file from `79` to `128` makes `TensorDeserializer` return the
expected key while loading the serialized tensor value as `[0.0]` instead of
`[1.0]`. The same result is observed with `verify_hash=True`.
## Files
- `valid_minimal.tensors`: baseline Tensorizer file containing one float32
tensor named `weight` with value `[1.0]`.
- `malformed_header_len.tensors`: same file with only the per-tensor header
length mutated.
- `repro_tensorizer_header_len.py`: local reproduction harness.
## Reproduce
```bash
python3 -m venv .venv
.venv/bin/python -m pip install --upgrade pip setuptools wheel
.venv/bin/python -m pip install 'tensorizer==2.12.1'
.venv/bin/python repro_tensorizer_header_len.py
```
Expected relevant output:
```text
original_header_len=79
mutated_header_len=128
first_tensor_value=[1.0]
first_tensor_value=[0.0]
first_tensor_value=[0.0]
```
## SHA-256
```text
01cd57e67442b0bcabd2f69230c0fd4a876f78a357afef1a6534328123497956 malformed_header_len.tensors
9d3d8919d307bcc8c6b5ee0f1d7635e1ee47b7e772aa0f7e737614277c9ab2a0 valid_minimal.tensors
```