01data-ai
/

gguf_py_f004_tensor_offset_aliasing

proof-of-concept

parser-divergence

model-integrity

Model card Files Files and versions

gguf_py_f004_tensor_offset_aliasing / README.md

01data-ai's picture

Upload 14 files

858df29 verified 1 day ago

|

history blame contribute delete

2.52 kB

	---
	library_name: gguf
	tags:
	- gguf
	- llama-cpp
	- security
	- proof-of-concept
	- parser-divergence
	- model-integrity
	- huntr
	- protectai
	license: other
	---

	# GGUF-PY-F004 Tensor Offset Aliasing

	Payload repository for Huntr / ProtectAI triage.

	## Finding

	Parser Divergence: Python GGUFReader accepts non-sequential tensor offsets causing silent tensor data aliasing while native llama.cpp rejects the same GGUF.

	## Primary PoC Model

	`poc_GGUF-PY-F004_alias.gguf`

	## Primary PoC SHA256

	`484eb1c0a3583b7af469d977d33fd4e39d6ae3b92897d1f6382454b9a7daf9de`

	## Proof Script

	`prove_f004_live_repo.py`

	## Confirmed Behavior

	The crafted GGUF file contains two tensors:

	- `tensor_a`
	- `tensor_b`

	Both tensors declare the same tensor data offset:

	```text
	tensor_a: data_offset = 0
	tensor_b: data_offset = 0

	Python GGUFReader accepts the file and loads both tensor names, but both tensors read from the same underlying bytes.

	Observed Python result:

	tensor_a_actual = [1.100000023841858, 2.200000047683716, 3.299999952316284, 4.400000095367432]
	tensor_b_actual = [1.100000023841858, 2.200000047683716, 3.299999952316284, 4.400000095367432]
	tensor_b_expected = [5.5, 6.6, 7.7, 8.8]
	alias_confirmed = True
	tensor_b_wrong_data = True
	MARKER_GGUF_PY_F004_ALIAS_CONFIRMED_LIVE_REPO

	Native llama-gguf rejects the same file:

	gguf_init_from_file_ptr: tensor 'tensor_b' has offset 0, expected 32
	gguf_init_from_file_ptr: failed to read tensor data
	EXIT_CODE=134
	Impact

	This demonstrates a parser divergence and model-file data integrity failure.

	A Python-based GGUF scanner, converter, validator, or ingestion pipeline using GGUFReader can accept a malformed GGUF file that native llama.cpp rejects, while silently mapping one tensor name to another tensor's data.

	This is not a code execution claim. The confirmed impact is silent tensor data aliasing / model integrity failure in Python GGUF processing.

	Key Evidence Files
	PYTHON/PYTHON_LIVE_REPO_OUTPUT.txt
	NATIVE/NATIVE_LIVE_REPO_OUTPUT.txt
	RAW_OUTPUT/final_repro_output.txt
	RAW_OUTPUT/key_markers.txt
	SOURCE/gguf_reader_tensor_offset_excerpt.txt
	SOURCE/gguf_cpp_tensor_offset_validation_excerpt.txt
	SOURCE/live_repo_status_and_diff_check.txt
	ENVIRONMENT/ENVIRONMENT.txt
	ENVIRONMENT/ENVIRONMENT_LIVE_REPO_CONFIRMED.txt
	SHA256SUMS.txt
	Scope

	Confirmed against:

	Repository: ggerganov/llama.cpp
	Commit: a290ce626663dae1d54f70bce3ca6d8f67aab62f
	Native version: 9046 (a290ce626)
	Python component: gguf-py/gguf/gguf_reader.py
	Affected function: GGUFReader._build_tensors()