| --- |
| library_name: gguf |
| tags: |
| - gguf |
| - llama-cpp |
| - security |
| - proof-of-concept |
| - parser-divergence |
| - model-integrity |
| - huntr |
| - protectai |
| license: other |
| --- |
| |
| # GGUF-PY-F004 Tensor Offset Aliasing |
|
|
| Payload repository for Huntr / ProtectAI triage. |
|
|
| ## Finding |
|
|
| Parser Divergence: Python GGUFReader accepts non-sequential tensor offsets causing silent tensor data aliasing while native llama.cpp rejects the same GGUF. |
|
|
| ## Primary PoC Model |
|
|
| `poc_GGUF-PY-F004_alias.gguf` |
|
|
| ## Primary PoC SHA256 |
|
|
| `484eb1c0a3583b7af469d977d33fd4e39d6ae3b92897d1f6382454b9a7daf9de` |
|
|
| ## Proof Script |
|
|
| `prove_f004_live_repo.py` |
|
|
| ## Confirmed Behavior |
|
|
| The crafted GGUF file contains two tensors: |
|
|
| - `tensor_a` |
| - `tensor_b` |
|
|
| Both tensors declare the same tensor data offset: |
|
|
| ```text |
| tensor_a: data_offset = 0 |
| tensor_b: data_offset = 0 |
| |
| Python GGUFReader accepts the file and loads both tensor names, but both tensors read from the same underlying bytes. |
| |
| Observed Python result: |
| |
| tensor_a_actual = [1.100000023841858, 2.200000047683716, 3.299999952316284, 4.400000095367432] |
| tensor_b_actual = [1.100000023841858, 2.200000047683716, 3.299999952316284, 4.400000095367432] |
| tensor_b_expected = [5.5, 6.6, 7.7, 8.8] |
| alias_confirmed = True |
| tensor_b_wrong_data = True |
| MARKER_GGUF_PY_F004_ALIAS_CONFIRMED_LIVE_REPO |
| |
| Native llama-gguf rejects the same file: |
| |
| gguf_init_from_file_ptr: tensor 'tensor_b' has offset 0, expected 32 |
| gguf_init_from_file_ptr: failed to read tensor data |
| EXIT_CODE=134 |
| Impact |
| |
| This demonstrates a parser divergence and model-file data integrity failure. |
| |
| A Python-based GGUF scanner, converter, validator, or ingestion pipeline using GGUFReader can accept a malformed GGUF file that native llama.cpp rejects, while silently mapping one tensor name to another tensor's data. |
| |
| This is not a code execution claim. The confirmed impact is silent tensor data aliasing / model integrity failure in Python GGUF processing. |
| |
| Key Evidence Files |
| PYTHON/PYTHON_LIVE_REPO_OUTPUT.txt |
| NATIVE/NATIVE_LIVE_REPO_OUTPUT.txt |
| RAW_OUTPUT/final_repro_output.txt |
| RAW_OUTPUT/key_markers.txt |
| SOURCE/gguf_reader_tensor_offset_excerpt.txt |
| SOURCE/gguf_cpp_tensor_offset_validation_excerpt.txt |
| SOURCE/live_repo_status_and_diff_check.txt |
| ENVIRONMENT/ENVIRONMENT.txt |
| ENVIRONMENT/ENVIRONMENT_LIVE_REPO_CONFIRMED.txt |
| SHA256SUMS.txt |
| Scope |
| |
| Confirmed against: |
| |
| Repository: ggerganov/llama.cpp |
| Commit: a290ce626663dae1d54f70bce3ca6d8f67aab62f |
| Native version: 9046 (a290ce626) |
| Python component: gguf-py/gguf/gguf_reader.py |
| Affected function: GGUFReader._build_tensors() |